For organisations with responsibility for the long term preservation and management of digital data it is imperative that well documented archival strategies and procedures are in place. Documentation can range from generic policy statements through to the quite specific, for example, a series of Preservation Handbooks produced by the UK's Arts and Humanities Data Service (AHDS) and its subject specific data centres, including the ADS. Other national and international organisations providing useful documentation in terms of strategies and procedures include the UK Data Archive (UKDA), the British Library, the Library of Congress, the National Library of Australia, the United Kingdom Hydrographic Office (UKHO), NASA's National Space Science Data Centre (NSSDC), the Electronic Resource Preservation and Access Network (ERPANET), The Digital Preservation Coalition (DPC) and the Digital Curation Centre (DCC). Whilst often organisationally specific, some generic themes emerge from the available information including the emergence of the International Organization for Standardization (ISO) standard Open Archival Information System (OAIS) and the increasing take up of Lifecycle Management as an archival strategy. It is important for the producers of data, marine or otherwise, to be aware of these systems and concepts as they form the basis for archival strategies and inform selection, retention and preservation policies as well as metadata requirements.
Open Archival Information System
The development of the OAIS reference model has been pioneered by NASA's Consultative Committee for Space Data Systems (CCSDS). It has recently been accepted as an ISO (14721:2003) standard. A technical recommendation is also available for consultation on the CCSDS website. As a reference model OAIS provides a conceptual framework within which to consider the functional requirements for an archival system suited to the long term management and preservation of digital data. Such consideration can be given both to proposed and to existing systems. The model is also seen as a way of comparing systems through mapping discipline-specific jargon to OAIS terminology, and that such terminology is clear and unambiguous enough to allow understanding by those beyond dedicated archival staff. The core entities and work flows within the model are shown in Figure 1.
Data producers create Submission Information Packages (SIP). A SIP equates to a deposit of digital data plus any documentation and metadata necessary for the archive to facilitate the long term preservation of the data and to provide access for consumers (i.e. reuse). The SIP provides a basis for the creation of an Archival Information Package (AIP) and a Dissemination Information Package (DIP) generated by the archive. The process involves generating preservation and dissemination versions of the deposited data where necessary. For example, a Microsoft® Word DOC file might be converted to an XML based format such as an Open Office text document (ODT) for long term preservation and to PDF for dissemination. Metadata documenting this processing is added to the AIP as is any relevant information from the SIP. Similarly any resource discovery metadata and reuse documentation in the SIP is added to the DIP. Consequently metadata and documentation supplied as part of a SIP assume major importance in terms of data deposition. The OAIS standard notes of the SIP that 'Its form and detailed content are typically negotiated between the Producer and the OAIS'. In practice most repositories offer guidelines to depositors about acceptable formats, delivery media, copyright issues and necessary documentation and metadata. Many existing guidelines are relevant to marine project data and particular issues that might arise are discussed in following sections.
The most recent development is the publication of a certification document Trustworthy Repositories Audit & Certification (TRAC): Criteria and Checklist by the US based Research Libraries Group (RLG) (part of the Online Computer Library Center (OCLC)), the Center for Research Libraries (CRL) and the National Archives and Records Administration (NARA). The purpose of the checklist is identifying repositories capable of reliably managing digital collections. The audit checklist is closely tied to the OAIS reference model in terms of a conceptual framework and terminology and considers organisational suitability, repository workflows, user communities and usability of data, plus the underlying technical infrastructure including security. All of these areas must be openly documented. Organisations that can demonstrate that they meet the criteria within the checklist will be identified as Trusted Digital Repositories.
The CRL recently undertook a project to test the RLG-NARA metrics through actual audits of subject digital archives and one archiving system. A study exploring how the audit checklist can be applied to the management policies derived from a system based on DSpace digital asset management software in combination with the distributed data management software, Storage Resource Broker (SRB) has also been undertaken.
In general, the archival community, including the ADS, are actively seeking to become compliant with the reference model through this process of certification. It should, however, be noted that the audit checklist is a recent development and, for the time being, a state of trust needs to exist between creator and archive.
Whilst there are other archival strategies, OAIS conformance with its emphasis on ongoing management and administration of a digital resource implies an object lifecycle. At the 2006 conference The LIFE Project: Bringing digital preservation to life, Neil Beagrie's paper 'The LIFEcycle model, from paper to digital' discussed the evolution of lifecycle management from its beginnings in publications such as the Terotechnology Handbook (1978) which considered lifecycle costing and the idea of 'total cost of ownership' for physical objects. Subsequently, during the 1990s, the AHDS and the British Library and others built on this approach for digital assets. Beagrie noted how the early involvement of the JISC and the AHDS with project proposals through the provision of guidance and advice helped to reduce costs downstream. One manifestation of this was noted as the publication of a number of AHDS Guides to Good Practice.
By 1998 lifecycle frameworks for managing digital resources had become well defined as described, for example, by Beagrie and Dan Greenstein in A Strategic Policy Framework for Creating and Preserving Digital Collections and the subsequent development of this framework into a cost model by Tony Hendley in a British Library Research and Innovation Report (106). The Life Project final report provides a more recent and detailed methodology for calculating 'the long-term costs and future requirements of the preservation of digital assets'. This report will undoubtedly feed into many archival policies.
The generally recognised categories of the lifecycle of digital assets are
These categories and elements within them provide the framework for this guide.
It is worth looking briefly at alternatives to OAIS, which may in the future be relevant to digital data generated by maritime archaeology projects such as VENUS, although no example of where these alternatives are currently implemented can yet be identified.
The OAIS model described above implies a preservation strategy based on migration. An ideal is to move data to a software-independent format and subsequently migrate this through successive technical infrastructures over time (known as refreshment). There is without doubt a preference within the archival community to migrate to the most stable of all formats, ASCII text, which is an international standard of long standing. However, this is often not an option as with images for example. In such cases version and format migration is practiced. Files in such formats are also subject to periodic refreshment. It should be noted that this is not the only preservation strategy. Alternatives include technology preservation and emulation.
Here the data is preserved unchanged along with the technology (hardware and/or software) upon which it depends. Clearly there are problems with such a strategy as technology will fail over time and replacement becomes increasingly difficult and more costly. Jeff Rothenberg in 'Avoiding Technological Quicksand' (1999, section 6.3) notes the problems associated with this reliance on 'computer museums'. The ADS attempts to maintain a 'computer museum' but not to effect technology preservation, rather in a probably vain hope of facilitating data recovery from outdated media, although some of the 'exhibits' have been used in earnest. In the context of marine projects it is likely that substantial amounts of very specific hardware would need to be preserved. Some of this may be data acquisition hardware, but it is especially true with regard to Virtual Reality dissemination outputs that rely on specialist equipment such as head mounted displays, hemispherical displays etc.
Rothenberg favours emulation as an alternative preservation strategy. It is seen to have particular relevance where the look, feel, and behaviour of a data resource is of importance. Critiques of emulation include that it is still in its infancy in terms of development, that it is likely to be more costly than the implementation of a migration strategy, that there are likely to be software copyright issues and that (the original) software and hardware is rarely documented to a high enough level to allow subsequent emulation (Digital Preservation Coalition Handbook). An interesting and confusing development came about during the CAMiLEON project which developed a strategy called 'Migration on Request' which in fact is emulation with a tool being built to process the original byte stream of a digital object on request.
Interestingly it was recently decided to move the interactive video created in 1986 by the BBC to celebrate the 900th anniversary of the Domesday Book from its dependence on outmoded media and computer hardware. Numbers of experts were approached including the CAMiLEON project who 'argued that the slight faults in images as displayed from the <original> analogue discs were a part of that experience, and should not be cleaned up' but the National Archive 'wanted to preserve the data with the highest quality available consistent with longevity' and hence opted for migration (see Ariadne issue 36).
The long term preservation and dissemination of marine project data as described in this guide is within an OAIS compliant framework (ISO 14721:2003 standard). Because the certification metrics are very new, many archives (including the ADS) are currently working towards OAIS compliance. As such trust must exist between creator and archive. The Submission Information Package or SIP assumes major importance in the relationship between data producer and an OAIS compliant archive where as well as the data, documentation and metadata inform on preservation and reuse.