Workbook on Digital Private Papers > Administrative and preservation metadata > Using METS for the preservation and dissemination of digital archives

Using METS for the preservation and dissemination of digital archives

Structure of a METS file

Descriptive Metadata Section <dmdSec> Archivists have long been accustomed to producing descriptive metadata so that researchers can identify and retrieve archival content, so this <dmdSec> section will probably be most familiar to those working with traditional archives.

The <dmdSec> of METS is repeatable; this allows descriptive metadata to be recorded for each separate item or component in the METS document. In the case of the testbed digital objects held by Paradigm, each of which has its own individual METS file, the advantage has a different emphasis. At each stage of a digital archival object’s lifecycle, differing quantities of descriptive metadata will be required. For example, at AIP stage administrative metadata is most important and descriptive metadata might be represented simply in the form of a basic MODS or Dublin Core record. MODS is the option selected by the Bodleian Library, both because it is richer than Dublin Core and because it is used in various digital library contexts at Oxford, and is therefore useful for local interoperability. At DIP stage, a considerably higher proportion of descriptive information is needed to facilitate intellectual access for researchers. Existing MODS metadata could be retained and possibly enhanced, and a Dublin Core record could be added to items published in an online repository for the purpose of OAI-PMH harvesting. The object would also have an additional layer of descriptive metadata in the form of an entry (although not necessarily to item level) in a detailed EAD catalogue (referred to from the item’s METS DIP) for the archive of which it forms a part.

METS does not define the content of descriptive metadata elements; instead it allows descriptive metadata from other schemes to be incorporated in a METS file using one of two methods: it can either be embedded in the METS file itself, using the <mdWrap> element, along with the element (if the metadata is in XML form) or the <binData> (if not in XML); or it can be stored in an external file and referred to from the METS file by means of a URI, using the <mdRef> element.

A unique ID attribute can be assigned to each <dmdSec> element, which facilitates linking from other sections of the METS document. There is also an optional GROUPID attribute which is used to indicate that different metadata sections may be considered as part of a group; this is useful for grouping changed versions of the same metadata if previous versions are maintained in a file for tracking purposes.

The <dmdSec> element can also include an ADMID attribute, which can be used to link it to relevant administrative metadata sections that relate to the digital object described; this is done by citing as attribute values the IDs allocated to each administrative metadata section.

Internal IDs play an important role in any METS document. Each descriptive and administrative metadata section is given an ID which is unique within the METS file; these can be referred to from other elements, as described above. This allows units of information which appear in dispersed locations across a METS document to be linked to all their appropriate contexts.

The Paradigm model uses <mdWrap> to embed a MODS record within the METS document for a digital object at AIP stage; at DIP stage (for items published to online repositories) a Dublin Core record could also be embedded using <mdWrap> and all items belonging to catalogued archives would include the element to link to an external EAD catalogue entry.

Paradigm recommends that the following attributes be associated with the <mdRef> element:

Example DIP:

This example illustrates what the descriptive metadata sections of a METS document might look like in a DIP for a single email published to an online repository. Much of the information supplied in the Dublin Core and MODS records will be extracted automatically, including: the title (the subject line of the email, in this case ‘Latest draft of election press release’); the name of the creator (the sender of the email, in this case a member of politician’s staff); and the date and time the email was sent. The EAD reference code for the email is ABC/1/3/9/670/1 (this is also given as the identifier in both the Dublin Core and MODS records). Each descriptive metadata section is also given a unique identifier (for referencing from elsewhere) and an explanatory label for researchers. Sections of administrative metadata relating to the same digital object are referenced using the ADMID attribute. Policy on formats for IDs will be decided at local level, as they are not intended as external identifiers.

The ENCODING attribute value of "w3cdtf" simply refers to the way in which the date is represented (based on a profile of the ISO 8601 standard specifying the pattern YYYY-MM-DD).

Code sample

In the <mdRef> element here, the XPTR attribute points to a unique ID which has been added as an attribute value within a specific EAD component level tag (at either folder or item level) in an external EAD catalogue, e.g. <c03 id="abc23">. For more information on linking between EAD and METS documents, see Chapter 06 Arranging and cataloguing digital and hybrid archives.