Powered by
JSPWiki v2.8.2
g2gp 17-01-2009
View PDF
This is version . It is not the current version, and thus it cannot be edited.
[Back to current version]   [Restore this version]

Section 3: Archiving GIS Datasets#

3.1 Preparing to Archive: Files and Formats#

As GIS data often incorporates data from a variety of sources the formats that are safest for digital preservation vary with the type of information contained within a file. In this section, recommendations are given for formatting of GIS files, databases, images, documentation, and metadata.

Significant Properties

Any archiving of GIS files should aim to preserve the following properties:

  • Coordinate reference system information
  • Geometry (e.g. point, polygon, line)
  • Attribute fields
  • For rasters - source elevation model, bit-type, colourmap, pixel type

Strictly speaking, colour is not seen as a significant property of GIS data. This tailoring of data is stored in the project file (see below) and not in the digital object itself. If data creators require that colour/styling of original data should be recorded then this should be supplied as documentation in the form of a document or image. This documentation can then be stored with the data.

3.1.1 GIS Files#

As highlighted in a 2009 DPC Technology Watch Report, "Attempts at defining a universal data model for geospatial data have been made (for example the Spatial Data Transfer Standard (SDTS)...but have not achieved widespread adoption. As a consequence, it is not possible to speak of - geospatial data - as a single type of information that can be handled by multiple, functionally equivalent applications and formats." (McGarva et al 2009, 5). As with other data types discussed in these Guides, where the original source data cannot be archived outside of the GIS software, the most suitable files to use for archiving GIS data fall into the categories of open formats (e.g. GML and KML) and widely used standards (e.g. ESRI Shapefiles).

General considerations, as outlined in the Guide-wide section on Planning for the Creation of Digital Data include ensuring that data, where possible, is not encoded or compressed.

Project Files

In many GIS applications, project files - such as .apr or .mxd. - can be created to hold data in a tailored manner that involves classification, symbolization, and annotation based upon the data content. These data views typically appear as maps, charts, or tables, or some combination thereof. In order for an end user to render this content it is necessary not only to have the project file, but also the software that supports it, the related components (possibly including software add-ons or extensions), as well as the actual data. The required use of specific software, the complexity of the project file formats, and the tenuous links to the actual data, which is often simply pointed to, put these project files at high risk for failure over time. It is therefor recommended that project files are not archived or at least are not used to hold key information relating to the associated datasets.

File Formats

Generally speaking, GIS data falls into two main categories, geo-referenced vector and gei-referenced raster data formats. Unlike other simpler data types, GIS files may consist of more than one physical file/object. This is well illustrated by the case of ESRI Shapefiles where a single 'file' may be made up of a collect of up to eight separate files. When archiving GIS data it is essential that all relevant files are stored.

Geo-referenced Vector
ArcInfo Interchange (.e00)An ESRI format developed to move coverages, INFO data files, text files such as ARC Macro Language (AML) files, and other ArcInfo files between machines not connected by a file sharing network. Interchange files contain all coverage information and appropriate INFO data file information in a fixed-length ASCII format. The ESRI E00 interchange data format combines spatial and descriptive information for vectors and rasters in a single ASCII file. It is mainly used to exchange files between different versions of ArcInfo, but can also be read by many other GIS programs. This can be used as a preservation format.
ESRI Shapefile (.shp, .shx, .dbf, .sbn and .sbx, .fbn and .fbx, .ain and .aih, .prj and .xml)Shapefile is an openly published format and is actually a collection of files the number and combination of which depends upon the type of data stored in the file. Shapefiles store nontopological geometry and must be accompanied by in index file (.shx) and a dBASE file that holds the attributes of the shapes in the shp file. Shapefiles contain the following files:
- SHP - the file that stores the feature geometry. Required.
- SHX - the file that stores the index of the feature geometry. Required.
- DBF - the dBASE file that stores the attribute information of features. Required.
- SBN,SBX - the files that store the spatial index of the features. Optional.
- FBN, FBX - the files that store the spatial index of the features for shapefiles that are read-only. Optional.
- AIN, AIH - the files that store the attribute index of the active fields in a table or a theme's attribute table. Optional.
- PRJ - the file that stores the coordinate system information. Optional.
- XML - metadata. Optional.
MapInfo Interchange Format (.mif & .mid)MapInfo is a commonly used GIS software package. Where the .mif file contains the grahics, the .mid component contains any attribute data as delimited text and is optional. The format is a standard format and most other GIS programs can also read it. This format is ASCII based and open and thus a possible preservation format although MapInfo products provide support for GML which is even more suited to preservation.
Spatial Data transfer standard (.ddf)The Spatial Data Transfer Standard (SDTS) is a data exchange format for transfering different databases between disimilar computing systems, preserving meaning and minimizing the amount of external information needed to describe the data. It can only be used for certain types of feature point, arc and grid data. One coverage would produce many files all with extension .ddf.
Vector product Format (.vpf)Vector Product Format (VPF) is a U.S. Department of Defense Standard. The National Imagery and Mapping Agency (NIMA) is using VPF for digital vector products developed at a variety of scales. VPF has also been adopted into an international spatial standard as the Digital Geographic Information Exchange Standard (DIGEST). Vector Product Format (VPF) coverages and tables can be translated into ARC/INFO coverages and INFO tables.

Geographic Markup Language (.gml)GML utilises XML to express geographical features. It can serve as a modelling language for geographic systems as well as an open interchange format for geographic data. It is an ISO standard (ISO 19136) and is built on a number of other ISO standards collectively known as the 19100 family. GML is defined by the Open Geospatial Consortium. In being an XML based schema and an ISO standard GML is very suitable as a preservation format for Geographical data.
The XSD element is an automatically created XML file, it doesn't contain much but should be retained.
  • Idrisi
  • NTF
  • SDTF
  • MOSS
  • KML

Geo-referenced Raster
Geo-referenced TIF Image .tif (.rrd,.aux .xml) Geo-referenced TIF Image .tif (&.aux, .xml) Geo-referenced TIF Image zipped .tif (&.aux, .xml) ESRI GRID (ascii) .asc or .grd ESRI Grid (ascii) .asc or .grd ESRI Grid (ascii) .asc or .grd and/or Geo-referenced TIF Image zipped .tif (&.aux, .xml) ESRI GRID (binary) .adf ESRI Grid (ascii) .asc or .grd ESRI Grid (ascii) .asc or .grd and/or Geo-referenced TIF Image zipped .tif (&.aux, .xml) ERDAS Imagine files .img (.rrd) Geo-referenced TIF Image .tif (&.aux, .xml) Geo-referenced TIF Image zipped .tif (&.aux, .xml) JPG World .jpg &jgw (.rrd,.aux,.xml) Geo-referenced TIF Image .tif (&.aux, .xml) JPG World .jpg &jgw (.rrd,.aux,.xml)

Database Files

If you have external databases connected to your GIS system, for example a database containing your attribute data, then you may want to archive these as well. Details on how best to archive database data is covered in the Databases and Spreadsheets guide.

Image Files

It is NOT necessary to archive images of every single coverage in your GIS, nor is it necessary to archive images showing all of the ways you used the GIS to play with that data. Occasionally an image may have proven useful to you in a research project and, in order to document the research that you did, archiving that image might be worth more than 1,000 words of documentation. One example is an image showing lithic flakes scattered across a house floor in a pattern that you argued demonstrates lithic production was taking place on site -- that single image might be well worth including.

Further information on archiving raster images can be found in the Raster Images guide.

3.1.2 Documentation and Metadata to accompany your GIS, database, or image files

Your data set -- the GIS files, database files, and image files -- will need to be accompanied by detailed documentation as described in Sections 3.2 and 3.3. These are general guidelines and certain archives may have specific requirements for the format and content of the metadata that accompanies GIS and spatial datasets e.g. some archive may request that it is supplied as documents along with the data whereas others, e.g. tDAR, utilise interactive web forms to help users create metadata for resources they deposit.