This case study was produced as a component of a two week work placement during April 2012 at the ADS funded by the Archaeology in Contemporary Europe (ACE) mobility bursary scheme.
Over the last 4 years, Inrap has been experimenting with the use of tablet PCs to record data directly from the field phase, with a relational database centralizing all the information gathered during the excavation and allowing the deposit of data collected by the team on a shared server (NAS) in the post-excavation phase. Nicolas Holzem (Inrap Centre) developed an initial database, called DataDiag, which was tested on different evaluations from the summer of 2010. The database has evolved to a less oriented system with new tests on evaluations and its use in several excavations, including Lassay-sur-Croisne and Neuvy-Pailloux and then on the first two excavations of Étrechet "Croc au Loup" and "Le Four à Chaux".
After a presentation of several Inrap database systems to the ADS team during the ACE placement, it was agreed to focus this case study on the ArchéoDB database. This allows us to approach various different aspects of archiving: backing up of a database and its associated documentation files (pictures, drawings, GIS files, inventories).
This is version 1.3.20 of the database, developed to be deployed on the third excavation of Étrechet "Fets de Renier", which is discussed here. It contains the record of 1122 structures and their stratigraphic units as well as recording photographs and minutes field (displayed as thumbnails). The database will still be supplemented with other data (dates, results of specific studies). GIS exploitation is in its infancy and should also evolve (Fig. 2). So, this is an intermediate version of the database it was decided to archive here. In the frame of a real procedure, we can imagine that the depositor asks to deposit of a second accession with a more complete database, resulting in this case the conservation of a second archive (in the same collection).
The selection of information to keep and formats to use was conducted in accordance with the recommendations of the online Guides to Good Practice available on the ADS website platform. However, some characteristics of the database ArchéoDB have pushed us to save additional documentation: a copy of the main screen and entry forms in TIFF file format (to keep track of work done in ergonomics and organizing the data acquisition).
The inventories generated and formatted by the database (reports) were also retained for archiving and distribution in PDF/A-1b (they were not compatible for a save as PDF/A-1a), as when the formatting is consistent with current requirements of the regional archaeological service in the Centre region.
For the saving of the database, we opted for the conservation of the different tables in TXT file format, the most suitable for permanent conservation, with a delimitation of the text using the vertical bar (also called pipe) “|” .
Reference tables used to supply drop-down menus present in the entry forms (prefixed "lst_" in the database) and tables generated by queries have not been kept, as directed by ADS. The use of a reference table or a list of values for editing a field, however, has consistently been mentioned in the metadata file describing the database tables.
On raster images (photos and minutes field), originally created in JPEG file format, a conversion in TIFF file format was elected to avoid further damage during any re-registration. JPEG file formats have been retained for consultation and online dissemination.
One vector drawing (minute field template) is associated with the database. This, originally designed in Adobe Illustrator, has been saved in the SVG file format for conservation as well as for dissemination.
For GIS files, only the three types of files that composed a shapefile have been retained for archiving and distribution: SHP (shape format), SHX (shape index format) and DBF (attribute format in dBase). Following the recommendations of ADS, the project file from the application QGIS as well as LYR (layer symbology), PRJ (projection format), QML (style diapers), SBN and SBX (spatial index of the features) have not been retained for conservation.
Export of tables from the database at .txt format
The text format is valid both for the preservation and dissemination of data online. The generated files in this format, corresponding to the main database tables of ArchéoDB, will therefore be duplicated in the two sections of the corresponding archive folder (“preservation” and “dissemination”).
Below is detailed, step by step, the procedure to export tables in TXT file format:
Backing up of the relationships
It was decided to keep a copy of the relational schema in TIFF file format to document the database but also in JPEG file format to dissemination online.
Detailed below is the procedure to generate an image from the relational schema (not the ability to record directly into image format via Microsoft Access 2003):
Screenshots of entry forms
We chose to save screenshots of entry forms to document the database. This backup is performed in TIFF file format using the open source application GIMP, with the following procedure:
Treatment of photographs and minutes field (raster images)
Digital photographs and scanned minutes field, originally in JPEG file format, are stored in the same format in the dissemination section of the archive.
Before archiving, the original images must first be renamed by set using the application XnView. The name is composed as follows: “name of the database-name of the associated table in the database-name of the original image” (without spaces, accents or special characters).
The procedure is as follows:
To allow permanent conservation, all of these images are then converted simultaneously in TIFF file format using Adobe Photoshop. Procedure is as follows:
Treatment of drawings (vector image)
Only one drawing, done in Adobe Illustator, is associated with the ArchéoDB database. This is a blank template intended to be used as the basis of field surveys. It is therefore not a single raster image file associated.
It was converted to SVG file format for its preservation and dissemination. The procedure is as follows:
Treatment of inventories (formatted text)
Normally, textual documents are archived in TXT file format, unless you wish to conserve the format or layout aspects of it. In these cases, the use of PDF/A is recommended. PDF/A-1 specification was published by ISO (19005) and is used by standards organizations around the world to ensure the safety and reliability of the dissemination and exchange of electronic documents.
There are two variants of PDF/A-1:
When the original file allows the conversion to PDF/A-1a, this format is preferred.
Detailed below is the procedure for the conversion of inventories directly from Microsoft Access:
GIS processing of files
No treatment was necessary for the files from the GIS. The only job was to select among all available files constituting those shapefiles (SHP, SHX, DBF). For each layer represented in the original GIS project, we have checked that we had kept the triptych of corresponding files.
Metadata associated with the project
The first metadata file to fill is the one on the project. We downloaded the corresponding model in the guidelines for depositors available on the website of ADS ("Collection-level Metadata Template")
We completed the various requested metadata: title, description, subject, location, author, date, etc. This file was named “archeodb-metadata-project” and saved as ODT file.
Metadata associated with the different files
To fill this metadata we chose not to use the template downloadable from the online Guideline for Depositors and have preferred to use the free application DROID that allows automatically generated the technical metadata requested. The result of the analysis performed with this tool has been saved in a file named “archeodb-metadata-files” in CSV file format.
The process of generating such metadata is described below:
To view the contents of the CSV file in a more readable spreadsheet format, simply apply the following procedure:
Metadata associated with the different documents
For metadata associated with the different types of documents, we used metadata fields described in the Guides to Good Practice . The set of metadata files were saved in ODT file format.
Five metadata files, one for each type of documents, were generated: “archeodb-metadata-database”, “archeodb photo-metadata”, “archeodb-metadata-minutes”, “archeodb-metadata-drawings”et “archeodb-metadata-gis”.
Creation of top-level folder
The file containing the entire archive must be named as follows: “Arch-Id collection-version number of the backup”. So, for our collection ArchéoDB, the folder name of the first level is “arch-1148-1”.
Creation of second-level folders
Folder « admin »
In the frame of our exercise this folder has been left blank. Usually, it includes:
Folder « original »
This is the folder containing the original collection. This contains a subfolder named first by the “Accession number” generated by the CMS (here “2246”). Inside it, there is still another level of subfolders indicating the date of the deposit (eg “2012-04-18”). Inside the file version of our own repository, there is still a top level folder named by the name of the database (“archeodb”). This gives us the following path to access files: arch-1148-1\original\2446\2012-04-18\archeodb.
In our folder “archeodb”, the original files are organized as follows:
Folder « preservation »
This folder of second level consists of the files for conservation and archived on the server administered by ADS. It's himself composed as many sub-folders that are file extensions. In addition, a dedicated subfolder dedicated for the documentation must accompany the archive. In our case, the 8 following subfolders were created:
Folder « dissemination »
This folder of second level consists of files archived for the dissemination on the ADS website. It's himself composed as many sub-folders that are file extensions.
In our case, the 7 following subfolders were created:
 The recording was originally done on the field using Excel files to the "facts" and "stratigraphic units".
 The delimiter used preferentially by ADS is the comma. This colliding with decimal numbers present in the database (numeric fields are not distinguished as the text fields in quotation marks), it was decided to use that other delimiter also tolerated by ADS. Note that the problem of the use of the comma does not arise for English system because it is the point that acts as a separator in decimal numbers.
 The treatment of the original files here and in the next section (processing of associated files), satisfies the constraints of long-term archiving but also of dissemination online. The formats used are those recommended in the various sections of Guides of Good Practice, available online at the following address: http://guides.archaeologydataservice.ac.uk/.
 The captioned or named fields with special characters and accents are not taken into account when exporting to .txt format. About the “captions”, the simplest way is to remove them when they are given (useless information as part of export).
 This operation can be performed only by those with Adobe Acrobat Pro. If the applicant does not have this application, the structure ensuring that archiving will do it.
 If we had several drawings to treated, we could proceed like for the images, with a simultaneous renaming of files using the application XnView.
 This method is usable only if Adobe Acrobat Pro (version 8 or higher) is installed on the computer. The conversion of an existing PDF to PDF/A can also be obtained from the application Adobe Acrobat but the procedure is more complicated and less effective. ADS preferably uses and recommends the application PDFTRON for the treatment PDF files: http://www.pdftron.com/.
 Adobe Acrobat Pro automatically detects here with what version of PDF / A supports the current file (1a and / or 1b).
 ADS Guidelines for Depositors : http://archaeologydataservice.ac.uk/advice/guidelinesForDepositors
 The application DROID is currently in use by ADS to check and possibly complete the metadata files received. It is freely downloadable at the following address: http://droid.sourceforge.net/.
 Metadata representing columns and archives files representing the lines (one line per file format).
 As part of this exercise we used the metadata described in the sections “Documents and Texts”, “Databases and Spreadsheets”, “Raster Images”, “Vector Images” and “GIS” (http://guides.archaeologydataservice.ac.uk/).
 The archive of a collection must be organized in a very precise tree and particular attention should be paid on folders and files names that compose it.
 The author of the database did not sent us the photos in their original size for size problems, so we treated here the thumbnails (used to display the forms in the database) as if s' were the original photos to archive.
 Documentation describing the database provided by the author in PDF format has been converted in TIFF file format because the original version of the file was not compatible for conversion to PDF/A.