Workbook on Digital Private Papers > Administrative and preservation metadata > Using METS for the preservation and dissemination of digital archives
Using METS for the preservation and dissemination of digital archives
Structure of a METS file
Administrative metadata section <amdSec> The <amdSec> acts as a holder for the key information which is central to long-term digital preservation - enabling the repository to manage the material effectively, ensuring that the digital object is authentic and clarifying intellectual property rights in the object. At AIP stage, administrative metadata will be extensive, while at DIP stage less of this information will be included.
The <amdSec> contains four principal subelements, all of which are optional and repeatable. METS does not prescribe the content of any of these administrative metadata sections, although it recommends a number of extension schemas for each type of metadata. As with descriptive metadata, administrative metadata may be embedded using <mdWrap> or stored externally and referred to using <mdRef>. The four main sub-sections within <amdSec> are as follows:
Technical metadata <techMD> Contains information about the generation of the digital object represented by the METS document, including its creation, format and use characteristics. Where relevant, METS extension schemas specific to particular object types, e.g. MIX for images and TextMD for text, should be used.
Intellectual property rights metadata <rightsMD> Contains information about any copyright and licensing attached to the digital object. The METSRights schema has been specifically developed for recording this kind of information in METS.
Source metadata <sourceMD> Used to record information about the analogue source of a digitised record (perhaps the MARC record for a digitised book). This element is not relevant to born-digital material.
Digital provenance metadata <digiprovMD> Records information about master/derivative relationships between the current digital object and its earlier forms, as well as recording information about format transformations and other preservation actions undertaken by the repository in relation to the object. Some XML schemas have been produced to encode the core digital preservation metadata elements specified by the PREMIS Data Dictionary, and these have been adopted as METS Extension Schemas. The model described here includes all PREMIS object metadata within the <digiprovMD> element of METS, but creates separate METS files for objects, agents, events and rights. Best practice combining METS/PREMIS is still nascent and some advocate splitting PREMIS entities across various amdSec sections.
METS allows the entire <amdSec> to be allocated a single unique ID by which it can be referred to from other parts of the METS document by means of the ADMID linking attribute.
Example:![]()
Each of the four major elements within <amdSec> can also be allocated a unique ID. Where a single METS document represents a single object (as with the Paradigm model), it may be sufficient to use but one ID at the highest <amdSec> level; however, to avoid ambiguity (for example, when creating a DIP METS file from the data in an AIP file) or to allow for the addition of extra metadata sections in future), it may be advisable to include IDs for <techMD>, <rightsMD> and <digiprovMD> separately.
The XML examples below are based on the email described in the <dmdSec> example above and give a basic indication of the kind of metadata included in <amdSec> subsections for digital object METS documents at the AIP stage. Information on which elements of administrative metadata might be included in a DIP are given in Chapter 06 Arranging and cataloguing digital and hybrid archives.
Technical metadata <techMD> For email and other text-based documents, the TextMD schema is used to embed technical metadata in the METS document, where such information does not duplicate the PREMIS record contained in <digiprovMD>. The METS Editorial Board recommends the texMD schema for encoding technical information about textual documents, whether digitised or born-digital.
Example:
In this simple example, the TextMD record is embedded within the METS document in XML form; the <techMD> element has been given a unique identifier, and a human-readable label is provided. The character set employed by the digital object is given (as stipulated in the TextMD Schema) using a controlled vocabulary established by the Internet Assigned Numbers Authority (IANA). The language of the email is given using the ISO 639-2 code to denote English, and the default font of the message is also supplied.
Intellectual property rights metadata <rightsMD> The Intellectual property rights (IPRs) associated with a digital object have a bearing both on preservation activities carried out within the repository (making multiple copies for preservation purposes, migrating into different formats, etc.) and on access and use by researchers (obtaining copies, downloading, quoting, etc.). Paradigm used the
Example from Object AIP:
Rights information for access and use is wrapped in the METS document using the METSRights schema. In this simple example - based on our single email message - the rights category is given as "Copyrighted", because textual documents like this will usually remain in copyright until 70 years after the death of their creator. Other values METSRights provides for this attribute include "Public Domain" (for material which is out of copyright), "Licensed" (for material which is subject to a licence; this might be a creative commons licence or a specific licence granted to the digital repository), "Contractual" (which might apply in the case of principal archive creators who grant copyright permission as part of their deposit or donation agreement with the repository) as well as "Other" to cover alternative statuses.
While contact details are included in this example, in reality it is likely to be impossible to record (or even to discover) current contact details for copyright holders in individual records like this, unless they have created a substantial proportion of the material held in the archive.
The
Digital provenance metadata <digiprovMD> The <digiprovMD> element records information which allows both repository staff and users to understand what modifications have been performed on a digital object during its lifecycle in order to judge whether and how those processes might have altered or corrupted the 'original' object. Four PREMIS XML Schemas have been produced to represent four of the entities outlined in the PREMIS Data Dictionary. These schemas: object, agent, event and rights - may be used in conjunction with METS. Paradigm proposes that METS documents for each of the four entities be created and linked together using the linking elements supplied by PREMIS. In this example all PREMIS metadata will be recorded in the <digiprovMD> of METS. The rights metadata held by PREMIS is that which establishes the right of the repository to undertake preservation actions on the digital objects in its care. Format-specific technical metadata will be recorded in <techMD>.
Example from object AIP:
This example deals with PREMIS' object entity, which is analogous to the digital object (e.g. an email) that is the target of preservation.
The example below gives the unique identifier for the file generated by the Fedora repository software (and also encoded as the OBJID attribute of the METS document); full preservation is being undertaken, and the file is not encrypted in any way, so its composition level is set to 0. Information is provided within the <fixity> element on a checksum carried out on the object, and its size is recorded in bytes. The email was created in Microsoft Outlook 2002 and arrived at the repository in its original format (personal folders, or .pst). It was subsequently extracted and normalised to XML format using Xena software. The PREMIS <relationship> element records the relationship between the normalised email (file:2345) and the original mailbox (file:2301) from which it was derived; the normalisation is recorded as an event in a separate METS document (event:312). There is also a reference to the rights information relating to this email, which is recorded in another METS document (rights:2303).
Example from rights AIP
The rights section of the PREMIS record focuses on permission granted by the depositor of the archive to enable the repository to take various actions in relation to the digital material for preservation purposes; this permission is granted for a finite term in a 20-year deposit agreement.
Example from event AIP
The event section of PREMIS is used to record significant events in the life of the object. This event records the normalisation of the Microsoft Outlook pst file which created the email and attachment digital object, one of these emails was the object described above. There are references to two different agents (both represented by separate METS documents): the archivist (agent:10) who authorised the normalisation event, and the software (agent:111) which executed the event.
Example from Agent AIP