Data selection is a complex topic. While it may be cost-effective to select only certain files for archiving, different files or versions may become more important to future researchers, and new tools might enable better analysis of existing data. For complex data types, such as sensory scans or video recordings, these issues become critical as raw data may undergo significant changes as it is transformed into the final product.
The concept of a “preservation intervention point” identifies instances in which data is changed significantly. This may include the point of capture, filter, transformation or dissemination. Preservation intervention points provide a “chain of evidence” for data and, ideally, allow for the capture of the source, the result, and the processes or methods that ensure the transformation is repeatable. Although preservation intervention points are clearly more applicable to complex data types, they are still important for other source materials, such as images, where processing raw data can cause significant changes in the final result.
The preceding chapters outline the minimal requirements for an archival strategy where data have post-project relevance. As highlighted, such a strategy should include:
Once the long-term preservation of digital outputs of a process has been identified as desirable, then it is best approached as a task from the initial planning stages of a project. However, the identification of what files are to be preserved, and at which stages of the project lifecycle, should also be a key part of the project design phase and a component of the implementation phase in which the data is created or acquired.
For complex data capture or analysis tasks, copies of the data should be archived after each significant step of the process. The notion of data selection needs special attention, particularly in a complex archaeological project where data is acquired through one or multiple techniques and then merged or reprocessed through a number of phases to create a final “product.” Where a project has a series of lifecycle stages such as decimation, aggregation, recasting, and annotation, through which data is transformed, as well as being migrated from format to format, then there may be more than one potential preservation intervention point (PIP).
An example of this process is shown diagrammatically below (Figure 1, taken from the VENUS Project). Proceeding from left to right, it can be seen that data streams are first generated by various (hardware-based) techniques in the field, and then undergo a series of transformations until the project dissemination products are created. The example stages indicated in the diagram are not comprehensive or definitive (even for the VENUS project), but include:
While it is clear that this in no way represents the totality of possible stages in an archaeological project life-cycle, it highlights the fact that data creation is often not a straightforward process and that there are a number of potential stages where it might be appropriate to create a preservation copy of the data.
Data documentation and metadata are also key components of the PIP concept. Although it is generally considered good practice that data be in as raw a state as possible for preservation, on the assumption that any subsequent transformations applied can be recreated with the right documentation, this is not the case for all types of data. One example where this approach fails is with photogrammetric data, where a series of images are used to construct a three-dimensional output. In such a case, a three-dimensional output (e.g., a model of an amphora or a DTM) may be constructed from a series of high-resolution images, but the process by which the output is created may be proprietary or unrepeatable (e.g., only repeatable within a specific software package). In this case, both the original images and the three-dimensional outputs would represent preservation intervention points. In more complex processes there may be even more preservation intervention points. Once all potential PIPs have been identified, they have to be judged against a series of criteria so that only the most appropriate PIPs for each data stream are chosen. The broad criteria by which PIPs are judged are:
The above criteria are not ranked in order of importance, and each has to be balanced against the other. The process of examining a project's data lifecycle(s) should be performed, where applicable, as a consultation between the data creator and the archive.
In addition to the nature of the data itself, issues such as file size, storage costs, copyright and confidentiality may influence what data is selected for the final project archive. These issues are covered in the following chapters.
'Data Selection: Preservation Intervention Points'. Edited by Kieron Niven with contributions by Tony Austin, Jonathan Bateman, Stuart Jeffrey, Jen Mitcham
Archaeology Data Service / Digital Antiquity (2011) Guides to Good Practice