Powered by
JSPWiki v2.8.2
g2gp 17-01-2009
View PDF
This is version . It is not the current version, and thus it cannot be edited.
[Back to current version]   [Restore this version]

Archival Strategies#

Edited by Kieron Niven and Francis Pierce-McManamon#

Archival Strategies and Digital Lifecycles#

Whilst there are a number of archival strategies, as will be discussed below, it is important that the preservation of digital data - and the preservation strategy itself - is seen as an integrated component within a larger lifecycle of data creation, preservation and reuse rather than a process carried out at the end of a project. Preservation of digital data by data migration - i.e. the strategy advocated by these Guides - together with the OAIS model (see Appendix 1) which provides a structure for this, places an emphasis on the ongoing management and administration of a digital resource and an object lifecycle.

Although a common element in many preservation policies, the concept of a digital object lifecycle has been around for some time. At the 2006 conference The LIFE Project: Bringing digital preservation to life[1] Neil Beagrie in a paper entitled 'The LIFEcycle model, from paper to digital' discussed the evolution of lifecycle management from its beginnings in publications such as the Terotechnology Handbook (1978) which considered lifecycle costing and the idea of ‘total cost of ownership’ for physical objects. Subsequently, during the 1990s, the AHDS and the British Library and others built on this approach for digital assets. Beagrie noted how the early involvement of the JISC and the AHDS with project proposals through the provision of guidance and advice helped to reduce costs downstream. One manifestation of this was noted as the publication of a number of AHDS Guides to Good Practice[2].

By 1998 lifecycle frameworks for managing digital resources had become well defined as described, for example, by Beagrie and Dan Greenstein in 'A Strategic Policy Framework for Creating and Preserving Digital Collections'[3] and the subsequent development of this framework into a cost model by Tony Hendley in a British Library Research and Innovation Report (106)[4]. The LIFE Project final report provides a more recent and detailed methodology for calculating 'the long-term costs and future requirements of the preservation of digital assets'[5]. In addition, the Digital Curation Centre's Curation Lifecycle Model[6] provides a concise overview of how an object lifecycle can function and how elements continually feed back into the process of preservation and reuse.

Although simplified, the generally recognised categories of the lifecycle of digital assets are

  • Data Creation
  • Documentation and Metadata
  • Acquisition and Selection (retention or disposal)
  • Preservation and Management
  • Access and Use/Reuse

These categories provide the framework for the subsequent introductory sections of these Guides. Although there is a logical structure to these elements, these Guides will initially outline the core preservation and management strategies (this section) before looking at how the way in which data is preserved can be made simpler and more effective through planning at the data and archive creation stages. The final introductory sections will then look at data dissemination, access and use, particularly in the context of large datasets and rights management.

The Three Main Preservation Strategies#

At its simplest level, the preservation of digital files can be broken down into two key elements: firstly, the continued storage of data in an accessible and robust format and, secondly, the creation and maintenance of documentation (metadata) that allows the preserved data to be understood. Digital archiving strategies do not, and should not, rely simply on the physical preservation of a single disk, tape, or CD-ROM. The preservation of digital data, regardless of subject area or content, is generally approached via one of three main strategies:

  • Technology Preservation
  • Emulation
  • Migration

Although all three of these strategies are used by archives, these Guides recommend that a migration-based approach is adopted for the preservation of archaeological digital data. Such an approach relies on the migration of information from older hardware and software systems to newer systems. Conversely, in a Technology Preservation strategy the data is preserved unchanged along with the technology (hardware and/or software) upon which it depends. Rothenberg (1999, section 6.3) notes a number of problems associated with such a reliance on 'computer museums', namely that technology will inevitably fail over time and that maintenance and replacement will also become increasingly difficult and more costly. In the context of many archaeological projects it is likely that substantial amounts of very specific software and hardware would need to be both documented and preserved. The complete preservation of old hardware and software systems, as part of a strategy of technology preservation is a costly high-risk strategy and can be seen as unjustifiable unless data cannot be migrated and are of substantial international importance.

An Emulation strategy attempts to avoid some of the pitfalls of technology preservation through the emulation of older hardware/software systems in newer systems. This is technically challenging, expensive, and becomes increasingly difficult as current technology becomes ever more remote from the original systems employed. Emulation is therefore not recommended for archaeological archives.

Rothenberg, however, favours emulation as an alternative preservation strategy and it is seen to have particular relevance where the look, feel, and behaviour of a data resource is of importance. Critiques of emulation include that it is still in its infancy in terms of development, that it is likely to be more costly than the implementation of a migration strategy, that there are likely to be software copyright issues and that (the original) software and hardware is rarely documented to a high enough level to allow subsequent emulation [7]. An interesting and confusing development came about during the CAMiLEON project which developed a strategy called 'Migration on Request'[8] which in fact is emulation with a tool being built to process the original byte stream of a digital object on request. An interesting case study in emulation was the decision to move the interactive video created in 1986 by the BBC to celebrate the 900th anniversary of the Domesday Book from its dependence on outmoded media and computer hardware (Darlington et al, 1986). Numbers of experts were approached including the CAMiLEON project who 'argued that the slight faults in images as displayed from the <original> analogue discs were a part of that experience, and should not be cleaned up' but the National Archive 'wanted to preserve the data with the highest quality available consistent with longevity' and hence opted for migration. Other projects, such as KEEP[9], continue to emerge looking at the possibilities of emulation within a digital preservation strategy.

Preservation by Migration#

A migration-based preservation strategy rests on the principle of moving data to software-independent formats (normalisation) and subsequently migrating it through successive technical infrastructures over time (known as refreshment). There is without doubt a preference within the archival community that, where possible, data should be migrated to small set of file formats (normalisation) which are viewed as stable and, where possible, openly documented. This not only reduces the potential number of refreshments that the data has to undergo but also allows such migration processes to be carried out easily across large sets of data. As stated above, it is recommended that digital archiving in archaeology should employ the policy and procedures of controlled data migration. There are four main activities required for successful digital archiving with this strategy:

  • Data Refreshment
  • Data Migration
  • Data Documentation
  • Data Management Tools

Data Refreshment

Data refreshment is the act of copying information from one medium to the next as the original medium nears the end of its reliable life span. Research into the life span of both magnetic and optical media has been conducted. The overwhelming conclusion from this research is that even though magnetic media can be safe for 5-10 years and optical media may survive for more than 30, technology changes much more quickly. Digital media are far more likely to become unreadable as a result of changing technology than they are through media degradation. For example, 10 years ago many archaeologists collected information on 3" Amstrad disks. These disks are completely unreadable by PC machines, and cannot be accessed unless a surviving Amstrad computer can be networked or has a peripheral such as a 3.5". Even then, as Amstrads use a different operating system to PCs, the data must be exported in a standard format such as ASCII text. On the other hand, if archaeologists had refreshed their data from 3-inch Amstrad diskettes to 5.25" disks and then onto 3.5" disks, these digital data would still be accessible and safe.

The architecture of hardware changes rapidly, but not as rapidly as software. Data created or gathered in a proprietary software format is hostage to the long-term viability of that brand, and the company that produces it. These cannot be assured. Certain types of file have been earmarked as industry standard formats, while in other cases there may be open formats available that, while losing some of the original versatility, may nonetheless allow reconstruction or importation into other updated software types.

Data Migration

Data migration is even more important than data refreshment for successful digital archiving. Migration is the act of copying digital information from one format or structure into another that can be read by current versions of software. One example is the migration of data between different Computer-Aided Design (CAD) packages. However, even though CAD packages allow data to be exported in a 'standard' format (DXF), in fact many programmes create such DXF files in unique ways that may not be readable to other packages. The unwary can find that, without careful migration, much of the original information is lost when creating a new file. Many migration programmes incorporate a process of format normalisation i.e. files are migrated into common stable formats and then migrated, where necessary, through successive versions of the format. An ideal is to normalise files to open standard and preferably text-based formats (e.g. XML or ASCII) though this is often not an option as with images for example. In such cases version and format migration is practised and files in such formats are also subject to periodic refreshment.

It is also essential to monitor ongoing research and development into a good digital preservation strategy and not to assume that the formats that were safest last year are still the best. Digital preservation requires active intervention and constant vigilance. If this cannot be provided locally it is best to contact a professional digital archiving service.

A strict regime of validating all files and their documentation must be followed during the migration process. Original media should be retained until the validation process is 100% complete since it may be necessary to return to the original file if any aspect of the data could not be successfully and safely migrated.

Data Documentation

In order for a digital archivist to migrate digital information successfully, it is necessary to understand the structure of the data and how different parts relate to one another. Data migration thus relies on the fourth activity: data documentation.

Documentation is essential for both the preservation and reuse of all dataset types. While abbreviations and shorthand used in the data might make sense to the creator, there is no guarantee that these will make sense to someone reusing the data several years later. Worse still, in a large archive, a file describing contexts or small finds may relate to only one of several possible excavations. For these reasons it is necessary to document the codes or abbreviations used in the data together with any standards that have been followed or thesuari or word lists that have been used. In addition, it is often useful to provide an accurate (if brief) description of each file, explaining how different files fit together. Software often incorporates the option to generate documentation during the creation of files (e.g. field decsriptions in databases) though they often can vary in their depth and relevance. The type of documentation required will also depend on the type of data; for example the documentation for a text file may be quite straightforward, while that for a GIS or database may be quite intensive. Data documentation, at the file or data type level, is dealt with in more detail in the relevant chapters within these Guides.

No digital archivist can successfully preserve data that are not fully documented, because at every step of data migration information can be lost. This leaves archivists with two options: migrating data from one format and then double-checking each entry manually, or requiring thorough documentation of the data at the time of archiving so migrations can be carefully planned and tested in advance.

Data Management Tools

As already noted, digital data needs to be regularly refreshed and migrated. Digital files that are stored on a local network or are in current use should move naturally into a localised back-up strategy as alterations take place. Digital files that are stored in a deep storage facility (a preferred archival strategy for long-term preservation, whereby files are stored in a remote and specialised repository) require active intervention for appropriate updating and version control to take place.

Digital archives need to be actively managed. Use of Electronic Document Management (EDM) systems are recommended; these are data management tools usually in the form of a database. The system employed should flag dates and will ideally automatically inform the archive manager when files need attention (backing up, migration, refreshment).

Add in footnotes from here

Archival Policies#

Organisations with responsibility for the long term preservation and management of digital data should have well documented archival strategies and procedures in place. Documentation can range from generic policy statements through to the quite specific, for example, the series of Preservation Handbooks[10] produced by the UK's Arts and Humanities Data Service (AHDS) and its subject specific data centres, including the ADS. Other national and international organisations providing useful documentation in terms of strategies and procedures include the UK Data Archive(UKDA)[11], the British Library[12], the Library of Congress[13], the National Library of Australia[14], the United Kingdom Hydrographic Office (UKHO)[15], NASA's National Space Science Data Centre (NSSDC)[16], the Electronic Resource Preservation and Access Network (ERPANET), The Digital Preservation Coalition (DPC) and the Digital Curation Centre (DCC). Whilst often organisationally specific, some generic themes emerge from the available information including the emergence of the International Organization for Standardization (ISO) standard Open Archival Information System (OAIS) (see Appendix 1) and the increasing take up of Lifecycle Management as an archival strategy.

Recent developments have seen a move towards certifying data repositories and providing assurance that data remains accessible in the future. The publication of a certification document Trustworthy Repositories Audit & Certification (TRAC): Criteria and Checklist by the US based Research Libraries Group (RLG), the Center for Research Libraries (CRL) and the National Archives and Records Administration (NARA) aims to provide a checklist for identifying repositories capable of reliably managing digital collections. The audit checklist is closely tied to the OAIS reference model in terms of a conceptual framework and terminology and considers organisational suitability, repository workflows, user communities and usability of data, plus the underlying technical infrastructure including security. All of these areas must be openly documented. Organisations that can demonstrate that they meet the criteria within the checklist will be identified as Trusted Digital Repositories. The CRL is currently undertaking a project to test the RLG-NARA metrics through actual audits of subject digital archives and one archiving system and has published an audit report on the assessment of Porticohttp://www.crl.edu/archiving-preservation/digital-archives/certification-and-assessment-digital-repositories/portico

The Data Seal of Approval also aims to provide a similar quality assessment to that of the TRAC system although with a slightly simpler structure based on sixteen guidelines focussed on three groups of stakeholders (producer, consumer and archive). Again, as with TRAC, the aim is to guarantee the durability of the data concerned, but also to promote the goal of durable archiving in general. In general the archival community are actively seeking to become compliant with the OAIS reference model through the process of certification. It should, however, be noted that such audit checklists are a very recent development and, for the time being, a state of trust needs to exist between creator and archive.

Refs to move

Beagrie, N. and D. Greenstein (1998) A Strategic Policy Framework for Creating and Preserving Digital Collections. http://ahds.ac.uk/manage/framework.htm

Rothenberg, J. (1999) Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation. http://www.clir.org/PUBS/reports/rothenberg/pub77.pdf

Terotechnology Handbook (1978)

Darlington et al, 1986 http://www.ariadne.ac.uk/issue36/tna/

[1] http://www.dpconline.org/graphics/join/lifeconfrep.html
[2] http://www.ahds.ac.uk/archaeology/creating/guides/index.htm
[3] http://www.ukoln.ac.uk/services/papers/bl/framework/framework.html
[4] http://www.ukoln.ac.uk/services/elib/papers/tavistock/hendley/hendley.html
[5] http://eprints.ucl.ac.uk/archive/00001854/01/LifeProjMaster.pdf
[6] http://www.dcc.ac.uk/resources/curation-lifecycle-model
[7] http://www.dpconline.org/graphics/orgact/storage.html
[8] http://www.si.umich.edu/CAMILEON/reports/mor/
[9] http://www.keep-project.eu/ezpub2/index.php
[10] http://www.ahds.ac.uk/preservation/ahds-preservation-documents.htm
[11] http://www.data-archive.ac.uk/
[12] http://www.bl.uk/about/collectioncare/digpresintro.html
[13] http://www.digitalpreservation.gov/
[14] http://www.nla.gov.au/padi/
[15] http://www.ukho.gov.uk/amd/ProvidingHydrographicSurveys.asp
[16] http://nssdc.gsfc.nasa.gov/