Powered by
JSPWiki v2.8.2
g2gp 17-01-2009
View PDF
This is version . It is not the current version, and thus it cannot be edited.
[Back to current version]   [Restore this version]

Project Metadata#

As outlined in the previous chapter, metadata is data about data. Metadata refers to a more standardised set (a scheme or schema) of information that can be used to document in a structured way different aspects of a project at various different levels. The process and the reasons for creating metadata are well documented in a number of existing guidelines and by numerous organisations and repositories (e.g. NISO 2004, Day 2005, Ballegooie & Duff 2006) but, put simply, metadata aims to make digital resources easily identifiable, retrievable and usable through the storage of descriptive and contextual information.

Types of Metadata

For the purposes of these Guides metadata can be loosely grouped into two main categories (as described below). For the most part, these types of metadata are collected at a certain level (e.g. Project-level) but particular elements may be recorded for data from the general level right down to the specific file level. In addition, certain metadata standards may record elements of metadata which function at a number of levels (e.g. 'Author' may aid resource discovery as well as provide administrative information). The two categories relevant to archaeological projects are:

  • Project-level Metadata (incorporating 'Descriptive' and 'Resource Discovery' metadata) is largely recorded at a broad level for an entire project irrespective of the techniques used and covers elements such as period terms or dates, site and artefact keywords, project details, site codes and geographic location. Often much of this information is included within documents in the archive (e.g. site reports). Descriptive or Research Discovery Metadata is designed to allow the comprehensive description and easy retrieval of datasets and is more about the project and the results. The Dublin Core[1] standard is a good example of a metadata standard which incorporates a number of descriptive and resource discovery focussed elements.
  • File-level Metadata (incorporating 'Technical' and 'Preservation' metadata) is generally very specific and applied, as the name implies, at the level of individual files. File-level metadata incorporates information on hardware and software along with validation methods such as checksums. In many cases, if the data is to be deposited with a digital archive, it is the archive itself that will generate much of this metadata although a number of elements are often required to be completed by the data creator, often during the process of data creation. By its nature, technical metadata is often very specific to the data type and will therefore be covered within the relevant chapters in these Guides.

It is worth being aware that a third broad category, Administrative Metadata, exists within both of the above and covers elements such as creation and acquisition as well as alteration and version control[2]. Included within this is information concerning intellectual property rights. Such information can be recorded at a general level (i.e. ownership for an entire dataset may be held by one person or organisation) but should also be recorded for specific techniques or datasets where personnel or authors and IPR differs.

In addition to the benefits of a structured schema, metadata at any level can be further enhanced through the use of specific, standardised word lists and thesauri. These are resources, such as the English Heritage NMR Monument Type Thesaurus[3] or the MDA Object Type thesaurus[4], which allow elements of metadata schema (e.g. The Dubln Core 'Subject' element) to be completed in a controlled way thereby increasing the ease of use and reliability of the metadata. Throughout these Guides, where specific metadata schema are indicated, suitable resources for qualifying elements will also be highlighted.

Project-level Metadata

General Project-level metadata, like a good library catalogue, allows users to quickly and easily identify available resources and put them in touch with the resources that they need. However for this to work effectively, the metadata has to be implemented accurately and in a standard format. A commonly used format (and one used by the ADS) for project level metadata is Dublin Core. The standard - which has both 'simple' and 'qualified' versions - consists of fifteen core elements (eighteen in the qualified version) which can provide a detailed overview of the project as a whole, including geographical coverage, temporal dates, methodology, monument and evidence types. At the early stages of a project, metadata can often not be accurately compiled, however project workers should familiarise themselves with the type of generic project level metadata required. When depositing data with an archive, the depositor will often be expected to complete a project-level metadata record template together with any relevant format-specific template.

The example below forms the basis of ADS project-level metadata and consists of the basic Dublin Core elements.

ElementDescription
Project TitleThe title (and any alternatives) for the dataset.
DescriptionA brief summary of the main aims and objectives of the research project (or alternative process) from which the data collection arose together with a brief summary description of the content of the dataset.
SubjectKeywords for the subject content of the dataset (qualified using e.g. the English Heritage NMR Monument Type Thesaurus or the MDA Object Type Thesaurus.
CoverageThis is both spatial and temporal coverage.

For spatial coverage it should include the current and contemporary name(s) of the country, region, county, town or village covered by the data collection and, where possible, a standardised reference such as the Getty Thesaurus of Geographic Names[5] should be used. If names or administrative units were different during the time period covered by the data they should be recorded separately. Site coordinates can also be entered as a National grid reference in a number of different ways e.g. as a point (useful to describe a small project area via a central coordinate); as a line (e.g. at least 2 coordinates to represent the linear limits of the site); as a polygon (for a more complex site area, 3 or more coordinates are used to describe the boundaries).If applicable, the full postal code for the site can be included.

For temporal coverage it should include the dates/period covered by the dataset (using existing thesauri where possible such as the RCHME Period List).
CreatorsDetails of the creator(s), compiler(s), funding agencies, or other bodies or people intellectually responsible for the data collection. Information should include forename, surname, affiliation, address, phone, fax, email, or URL.
PublisherDetails about any organisation which has published this data.
IdentifiersProject or reference numbers used to identify the dataset.
DatesDates indicating when the dataset was created, when the archaeological project was carried out, processing dates, or computerisation dates as appropriate.
CopyrightThe name of the copyright holder for the dataset. If the collection was created during work by an employee, the copyright holder will normally be the employer. If the material is covered by a specific copyright (e.g. Crown copyright) please indicate this.
RelationsIf the data collection was derived in whole or in part from published or unpublished sources, whether printed or machine-readable, this element should include references to the original material, details of where the sources are held and how they are identified there (e.g. by accession number). If the collection is derived from other sources include an indication of whether the data represents a complete or partial transcription/copy and the methodology used for its digitisation. Also include full references to any publications about or based upon the data collection.
LanguageIndication of which language(s) the dataset is in (e.g. English, French, Spanish).
Resource TypeWhether the dataset is best described as primary data, processed data, an interpretation of data, or a final report.
FormatThe format the data is saved in (e.g. WordPerfect 5.1, HTML, AutoCAD).

Completed examples of this type of schema can be found in the ADS Guidelines for Depositors[6].

File-level Metadata

File-level metadata relates primarily to information required by an archive to preserve and disseminate files but additionally allows users to understand the nature of the files within a dataset and their reuse potential. File-level metadata is highly dependant on the type of data being recorded and the file type itself but commonly includes notes on elements such as the software used in creating the file, lists of file names, relationships between files and so on.

Although the first edition of the ADS Digital Archives from Excavation and Fieldwork: Guide to Good Practice recommended that each file in a digital archive should have an associated metadata record, experience has demonstrated that this level of documentation is largely unnecessary and that, dependant on the nature of the data itself, groups of files of the same format or within a discrete group can be documented by a single metadata record. It was recognised that the provision of file level documentation places a large burden upon the creator and substantially increases the time necessary to construct the archive and hence its cost. Research has also shown that users of digital data are more likely to search for entire archaeological projects, or for particular categories of information, reducing the need for individual file metadata.

File-level metadata is essential for knowing exactly what is in a dataset and how it can be used. It is recommended that, where data-specific guidelines are not suggested elsewhere in these Guides, the following elements are recorded as a bare minimum at a file level.

ElementDescription
File nameThe name of the file e.g. report.doc
File formatThe file format e.g. PDF/A or Open Office Document
Software used to create the filesThe software used to create the file e.g. Microsoft Word 2007
Hardware used to create the filesThe hardware used to create the file, this is more significant when files are created directly by survey equipment such as laser scanners or GPS devices.
Operating system used to create the filesThe operating system under which the file was made e.g. Windows XP or Mac OS X 10.5.
Date of creation/last file updateWhen the file was made or updated.
Processing history or LineageThis element should be used to highlight relationships between files and whether a file is a source file or derived from another.

If deposited with an external archive, this information would generally be entered into the archive's internal file management system and used to plan future migration and validation strategies for the data. It is at this point that the file-level metadata is enhanced by elements to record information such as fixity values or checksums. These are 'a form of redundancy check, a simple way to protect the integrity of data by detecting errors in data'[7]. The MD5 (Message-Digest algorithm 5) and the SHA (Secure Hash Algorithm) are widely used cryptographic hash functions and applying these algorithms to a file produces an (almost certainly) unique hash or checksum value which provides a mechanism for validating and auditing data. An isolated checksum is of course of no use on its own. It has to be associated with a file, a location and a project as structured data. The schema below is an example of one structure that can be used to hold such file-level metadata.

ElementDescription
UNIQUE_IDAuto-generated unique ID e.g.1234567
FILE_LOCATIONThe file path i.e. directory and filename e.g. /adsdata/cottam_ba/jpg/fwking_plan.jpg
CHECKSUM_TYPEThe checksum algorithm used e.g. MD5, SHA-1, etc.
CHECKSUM_VALUEThe checksum value generated by algorithm e.g. 578cbb18f73a885988426797bcab8770
PROJECT_IDA unique project ID e.g. ADS-123
GENERATEDDate the checksum was created e.g. 16-May-2006
GENERATED_BYPerson who created the checksum e.g. Doe, J
LAST_AUDITEDThe date at which the file was last checked or verified e.g. 16-May-2007

In addition to more detailed fixity information, an archive (and possibly a data creator) will want to maintain a process history as part of archival practice. An example would be importing XYZ data into a GIS. Again this can be recorded as simple structured data. The same structure can hold both file level and batch processing information. The following example is based on AHDS practice.

ElementDescription
PROCESS_IDAuto-generated unique ID e.g.1234567
PROJECT_IDA unique project ID e.g. ADS-123
SOURCE_FORMATThe format of the original file e.g. .txt
DESTINATION_FORMATThe destination format e.g. .shp
PROCESS_AGENTWho did the processing e.g. Doe, J.
PROCESS_COMMENTSComments relating to the process undertaken e.g. 'referenced to WGS84'.
PROCESS_START_DATEDate that the process started e.g. 17-May-2007
PROCESS_COMPLETION_DATEDate process completed e.g. 17-May-2007
PROCESS_DESCRIPTIONA description of the process e.g. 'Import of XYZ data into ArcView for analytical purposes and dissemination as research outcome'.
PROCESS_GUIDELINESAny guidelines related to the process.
PROCESS_HARDWARE_USEDHardware used to process the file e.g. Viglen Genie Intel Pentium 4
PROCESS_SOFTWARE_USEDSoftware used to process the file e.g. ESRI Arcview 9.1
PROCESS_INPUTFull file path of the source file e.g. /adsdata/pro-453/xyz/file.xyz
PROCESS_OUTPUTFull file path of the output file e.g. /adsdata/pro-453/shp/file.shp
PROCESS_RESULTComments on the result of the processing e.g. 'Success'.
PROCESS_TYPEDescription of the process carried out e.g. 'Conversion - dissemination'.

[1] http://dublincore.org/
[2] http://www.dcc.ac.uk/resources/briefing-papers/standards-watch-papers/what-are-metadata-standards
[3] http://thesaurus.english-heritage.org.uk/thesaurus.asp?thes_no=1
[4] http://thesaurus.english-heritage.org.uk/thesaurus.asp?thes_no=144
[5] http://www.getty.edu/research/tools/vocabularies/tgn/
[6] Part 3: Documenting the Project http://archaeologydataservice.ac.uk/advice/depositCreate3
[7] http://en.wikipedia.org/wiki/Checksums


References

van Ballegooie, M. and Duff, W. (2006) 'Archival Metadata' in S. Ross & M. Day (eds.) DCC Digital Curation Manual. http://www.dcc.ac.uk/resources/curation-reference-manual/completed-chapters/archival-metadata

Day, M (2005) 'Metadata' in S. Ross & M. Day (eds.) DCC Digital Curation Manual. http://www.dcc.ac.uk/resources/curation-reference-manual/completed-chapters/metadata

NISO (2004) Understanding Metadata. NISO Press. http://www.niso.org/standards/resources/UnderstandingMetadata.pdf