Skip to content

Help & guidance Guides to Good Practice

File formats for archiving

Kieron Niven, with contributions from Tony Austin, Jonathan Bateman, Stuart Jeffrey, Jen Mitcham, Archaeology Data Service / Digital Antiquity, Guides to Good Practice

As discussed in the previous section, this section aims to outline a number of file formats commonly used in marine survey projects while highlighting those suitable for data preservation, exchange and dissemination. As with the majority of other data formats (particularly terrestrial geophysics) conversion to ASCII text files is the preferred method for the long term preservation of survey data. While in many cases this simply means a form of ‘XYZ’ data (see Appendix 1 of the Geophysics guide), a number of plain text or XML based formats have also been developed which allow a more sophisticated method of storing data while still maintaining the advantages of a human readable format. While the original dataset may not be natively created in these formats, the majority of software packages will allow the user to export data to a text based format (and, indeed, many archives will do this regardless of format for preservation purposes e.g. the NODC). As with terrestrial geophysics, it is recommended that data is archived both in its native format and in a suitable preservation format. Information on how to structure these files is presented in section

Additional overviews of some of the file formats described here are also available at the OceanTeacher.org website, specifically on the pages covering Self describing formats, raster and grid formats and archive formats.

File Formats

Format Properties/Technologies Description Recommendations
ASCII text (.txt, .dat, .xyz) Published standard for ASCII Raw data, usually directly from a logger Data may be output by a logger as structured ASCII text and incorporated into a database. There are well established archival procedures for databases in exporting tables as delimited ASCII text and documenting through an Entity Relationship Model (ERM) and a Data Dictionary. Suitable preservation format when stored with supporting documentation.
Bathymetric Attributed Grid (BAG) A non-proprietary file format developed by the Open Navigation Surface Working Group and used for storing and exchanging bathymetric data. BAG files are gridded, multi-dimensional bathymetric data files and current versions of the format contain position and depth grid data, as well as position and uncertainty grid data, and associated metadata. The format is used as the standard NOS hydrographic data file for public release[5]. Suitable for data dissemination and exchange.
eXtended Triton Format (.xtf) A proprietary binary but publicly available specification[24] for raw data. As described by the Triton Imaging Inc ‘The XTF file format was created to answer the need for saving many different types of sonar, navigation, telemetry and bathymetry information. The format can easily be extended to include various types of data that may be encountered in the future’. Currently a publicly available specification, it is also described as an ‘industry standard’ for sonar. Some packages supporting XTF provide an ASCII text export. Suited for data exchange (e.g. Side-scan sonar, Sub-bottom profiling, etc.) while industry support is widespread. Where possible ASCII text exports with suitable metadata would provide the best long term preservation environment.
General Format 3 (GF3) and subsets A well established exchange format for oceanographic data. The GF3 format includes a subset for ‘Marine Geophysical Geophysical Data’, is well documented and has been developed to remove previous media dependence. Suitable for data exchange.
Generic Sensor Format (.gsf) A published[19] binary format used for raw data The Generic Sensor Format (GSF) is described as an exchange format primarily developed for use in the U.S. Department of Defense Bathymetric Library (DoDBL). The specification is currently openly published and, as well as the generic, it allows for attributes specific to a wide range of bathymetric surveying systems to be included. Possible use as an exchange and dissemination format for bathymetric data if widely supported.
Geography Markup Language (.gml) A published standard[9], XML based format, usually for processed data XML (and hence ASCII) based standard for geospatially referenced data. This encoding specification was developed and is maintained by the Open Geospatial Consortium (OGC). Many GIS packages including ESRI and MapInfo products now support GML. The emergence of the Geospatial Data Abstraction Library[10] (GDAL/OGR) is provides the means to easily migrate geospatial data into formats such as GML for preservation and data exchange. GML is ideally suited for preservation and data exchange of processed marine geospatial data.
GEOTIFF (.tiff) The GEOTIFF Format is a binary format in the public domain. Usually used to display processed data, the GEOTIFF standard allows metadata, specifically georeferencing to be embedded within a TIFF 6.0 compliant image. Despite being a binary format, TIFF has long been recognised as a de facto preservation standard for raster images. Suitable as a preservation format to store processed data for display.
Marine XML XML based format for marine data This is currently under development[14] but is openly documented[15] and appears to be an ideal format for the preservation of marine data (see Millard et al 2006). May be ideally suited for preservation and as an exchange format for marine datasets.
MGD77 (.mgd77) Published format[11] using ASCII, usually for raw data Developed by the US National Geophysical Data Center (NGDC) following an international workshop in 1977 and revised relatively recently. Described by UNESCO as having “been sanctioned by the Intergovernmental Oceanographic Commission (IOC) as an accepted standard for international data exchange’. The MGD77CONVERT[13] toolset allows conversion to the binary NetCDF format which offers an alternative and smaller means of dissemination. As an ASCII based and published format, MDG77 could act as a preservation format but primarily has support as a data exchange format for bathymetric data.
NetCDF / Network Common Data Form (.nc) A published binary format[16], often used to record raw data (or can be). NetCDF consists of a set of software libraries (freely available under licence) and data formats that support the creation and exchange of array-oriented marine data. Certain tools (e.g. ncgen and ncdump) also generate from and dump to ASCII. Appears widely used for bathymetric data, for example, the NERC British Oceanographic Data Centre[17 (BODC) and is used by IODE Ocean Data Portal[18] NetCDF could provide an ideal mechanism for the preservation and data sharing of bathymetric data through storing once and generating binary or ASCII as requested.
SDTS: Spatial Data Transfer Standard (various including .ddf) A binary published standard[20] for raw data An Earth Science standard developed by the USGS for raster and vector data exchange. Downloaded files are a tarred (zipped) directory which, in addition to data, contains numbers of DDF or data description files. Compliance with SDTS is a requirement for federal agencies in the US. There are large numbers of tools and translators for extracting data from SDTS to various formats. Well supported as a data exchange standard for geospatial data (e.g. DEM, terrain, image) but may be US centric.
SEG Y (.segy) A published[21] binary format for raw data. An openly published format by the Society of Exploration Geophysicists (SEG). Originally (rev. 0) developed in 1973 for use with IBM 9 track tapes and mainframe computers and using EBCDIC (an alternative to ASCII encoding rarely used today) descriptive headers. The standard was updated (rev. 1) in 2001 to accommodate ASCII textual file headers and the use of a wider range of media. It should be noted that in the interim between revisions a number of flavours of SEG Y appeared trying to overcome the limitations of rev. 0. SEG Y to ASCII converters exist as, for example, made available by the USGS[22]. A limited functionality SEG Y viewers can be downloaded (e.g. from Phoenix Data Solutions[23]. Can be converted to ASCII for preservation purposes. Possibly useful as a data exchange format for data from Sub-bottom profiling, Side-scan Sonar and Ground Penetrating Radar as it appears widely supported.
XYZ (.xyz .xyzrgb) Primarily an ASCII format though can be binary. Outlined in the Geophysics guide (Appendix 1) ASCII text is seen as the best option for long term preservation along with suitable metadata.

Disseminating marine data

As previously discussed, marine datasets may be large and difficult to transfer between users. This issue can be exacerbated when files are saved in uncompressed ASCII text formats and may limit access options to raw data. A number of examples do exist, however, of alternatives to the standard ‘file download’ dissemination method and which provide easy online access to large complex datasets. Examples such as the U.S. geodata.gov[25] portal highlights how access can be provided to a wide range of geodata on oceans and coasts in the U.S. through easy to use interfaces. Specific examples such as the Global Multi-Resolution Topography Data Portal[26] and the Canadian Marine Multibeam Bathymetric Data[27] also both highlight the use of online viewers to disseminate bathymetric datasets.

[5] http://www.ngdc.noaa.gov/mgg/bathymetry/hydro.html

[9] http://www.opengeospatial.org/standards/gml

[10] http://www.gdal.org/index.html

[11] http://www.ngdc.noaa.gov/mgg/dat/geodas/docs/mgd77.pdf

[13] ftp://ftp.tr.proftpd.org/mirrors/www.gmt.soest.hawaii.edu/gmt/doc/html/mgd77convert.html (page no longer available).

[14] http://www.iode.org/index.php?Itemid=60&id=21&option=com_content&task=view (page no longer available).

[15] http://www.jcomm.info/index.php?option=com_oe&task=viewDocumentRecord&docID=1109 (page no longer available).

[16] http://www.unidata.ucar.edu/software/netcdf/

[17] http://www.bodc.ac.uk/data/online_delivery/gebco/ [broken link]

[18] http://www.oceandataportal.org/index.php?option=com_content&task=view&id=41&Itemid=67&catid=9 (page no longer available).

[19] http://www.ldeo.columbia.edu/res/pi/MB-System/formatdoc/gsf_spec.pdf (page no longer available).

[20] http://mcmcweb.er.usgs.gov/sdts/standard.html

[21] https://library.seg.org/seg-technical-standards

[22] http://pubs.usgs.gov/of/2005/1311/of2005-1311.pdf

[23] http://www.phoenixdatasolutions.co.uk/seisvu.htm

[24] http://www.tritonimaginginc.com/site/content/public/downloads/FileFormatInfo/Xtf%20File%20Format_X31.pdf (page no longer available).

[25] http://gos2.geodata.gov/wps/portal/gos/kcxml/04_Sj9SPykssy0xPLMnMz0vMAfIjzeO94w0dgz31o3JS0xOTK_VD8_TDchMrMnMzq1JT9CP0o8yAKoycXEGaQUw_S_1IdCELDCETSy-ImItREELMByJm4uio7-uRn5uq760foF-Qk1Xl7e3tY-KoqAgAE7OQlw!!/delta/base64xml/L3dJdyEvd0ZNQUlzQUMvNElVRS82X0tfNEFF (page no longer available).

[26] http://www.marine-geo.org/portals/gmrt/

[27] http://gdr.ess.nrcan.gc.ca/multibath/e/viewer.htm (page no longer available).