Workbook on Digital Private Papers > Administrative and preservation metadata > Preservation metadata
Preservation metadata
Considerations affecting the design of a preservation metadata application profile
A number of factors must be considered when designing a preservation metadata application profile.
These include:
The technical nature of the files to be preserved
Different types of file present different preservation challenges. Examples of 'types' found in a personal archive include audio, diary, document, email, moving image, spreadsheet and still image; all have important technical characteristics which are specific to their type, such as colour depth in a digital still image. Format-specific sub-profiles may also be needed; for example, GIF or JPEG might form sub-profiles of the image profile. Additionally, a repository responsible for personal digital archives may wish to construct a preservation metadata application profile for personal computing software that it has licence to preserve; this would allow it to curate software needed for the extraction of older digital materials not supported by contemporary environments.
Modeling granularity
The many-layered approach (Intellectual Entities > Representations > Files) requires repositories to decide at which level units of metadata are most appropriately recorded and to implement a mechanism for linking between these layers. The repository must also decide at what levels it will link to metadata about associated events, agents and rights. Linking between metadata in this way produces an extensible framework that allows the repository to record metadata at the highest level applicable, and to extend the metadata about an Intellectual Entity over time by creating and linking to: new Representations, events/agents and rights metadata as need arises. Taking all the layers of metadata together supplies the preservation information needed to produce the Archival Information Package, that will support the preservation of Intellectual Entities as they evolve over time.
- Preservation metadata relating to Intellectual Entities: this is a conceptual level and applies to notions such as collection, accession and series found in hierarchical archival descriptions, as well as to lower level Intellectual Entities.
| Intellectual Entity level | Title | Example metadata |
|---|---|---|
| Collection level | 'Personal archive of Politician X' | Collection level metadata needed for administration; metadata captured for re-use as descriptive metadata. |
| Accession level | 'Second accession made to archive of Politician X, 16 May 2008' | Accession level metadata needed for administration; metadata captured for re-use as descriptive metadata, such as the original order of the accession. |
| Series level | 'Email archive of Politician X, 1999-2008' | Information about the original email environment; relationship metadata for component email folders. |
| Subseries level | 'Email folder relating to prisons, 2007-2008' | Relationship metadata for component emails. |
| Item level | 'Speech on rising prison populations, delivered 30 January 2007' | Basic descriptive metadata (e.g. creation date and author); metadata describing relationships with Representations of the Intellectual Entity (e.g. the item could exist as a MS Word 11 file and as an OpenDocumentText 1.0 file). |
| Item level | 'Picture of Mrs Williams at Liverpool docks, 1 February 2007' | Basic descriptive metadata (e.g. creation date); metadata describing relationships with Representations of the Intellectual Entity (e.g. the item could exist as an X3F file and as a JPEG 2000 file). |
Metadata needed for Intellectual Entities includes descriptive information about content and context, such as the original intellectual arrangement of the digital material in an accession, that must be captured at ingest in order that it may be used for administrative and discovery purposes at the appropriate time. The metadata about Intellectual Entities must also include references to their Representations (the sets of Files and their structural metadata, which rendered together produce a Representation of the Intellectual Entity).
- Preservation metadata relating to Representations of Intellectual Entities: all Representations should link to the Intellectual Entity which they represent and should include structural information which details how to construct the Representation of an Intellectual Entity from its constituent files, as shown in the diagram below:

In order to be meaningful, both Representation 1 and Representation 2 must be related to the Intellectual Entity they represent. The Representations must have structural metadata about their component Files in order to produce a Representation of the Intellectual Entity.
The original Representation of an Intellectual Entity should record information detailing the significant properties of the Representation, so that the repository can judge the success of any preservation actions on the Intellectual Entity. Representations should also link to related Representations, e.g. where a migration takes place, a link between the source and derived Representation should be present. The repository must decide whether it will use the Representation level for simple objects consisting of a single digital File.
- Preservation metadata at File level: most technical metadata is associated with the digital Files that compose the Representation of an Intellectual Entity. Such metadata includes information about the file format used and fixity information, such as a checksum or digital signature.
The object characteristics to be preserved
The characteristics, or 'significant properties', to be preserved could vary depending on the class or content of the Intellectual Entity, as well as who created it. Preserving some significant properties may be prohibitively expensive and the decision to preserve them may rest on the potential research value of the archive.
Authenticity requirements
Repositories must consider the level of detail required from audit trails. One helpful approach is to consider what questions archivists or researchers might want to ask in relation to Intellectual Entities and actions taken in respect of them in order to prove that the Intellectual Entities are authentic.
Embed or reference
External registries are being developed for some kinds of preservation metadata, such as file format information. Metadata entries in such registries can be referenced from the repository's metadata rather than held locally if desired. Repositories must balance the need to make metadata generation more efficient with the risks associated with reliance on a third party. The accuracy, coverage and sustainability of external sources of information, along with the cost of creating and maintaining it locally, should be assessed in deciding how much metadata repositories will hold locally and how much will be referenced. One argument in favour of recording metadata locally is that locally determined data structures and content may permit the repository to better query its contents for the purposes of collection profiling and batch preservation actions. An argument against local recording is the cost of creation and future maintenance.
Information required by future researchers
In capturing preservation metadata, repositories ought to consider the kinds of information that future users of the archive will need. Users are likely to be interested in some of the preservation metadata collected as historical information, such as the environment used to create the archive and any passwords that were used to protect certain Intellectual Entities.
Limitations of current tools
The scale of digital archives means that metadata creation must be automated as far as possible. The design of an application profile for preservation metadata must consider how the metadata will be generated and may therefore be limited by the functionality of existing tools.
Interoperability
Selecting common metadata standards will allow repositories to leverage expertise and tools from the community of practice working with those standards. This will reduce costs and risk to the repository, although good records of any profile developed from a standard should be maintained, and copies of documentation relating to the standard held locally or referenced from a sustainable external resource.
The metadata used by the repository should be independent of particular repository software requirements. It should be as easy as possible to move from one repository platform to another.
Preservation supported by the repository
The depth and breadth of metadata required by a repository may depend on the preservation strategies (see Chapter 08 Digital preservation strategies) envisaged. Repositories which offer to preserve the original bitstream, but transfer the burden of rendering to the user could operate using a simpler metadata profile than a repository which offers to preserve access to materials for its users. A repository offering preservation of objects conforming to 20 formats may require more metadata, and a more sophisticated metadata model, than a repository which migrates all objects to XML on ingest.
Cost and performance
Metadata can be one of the most costly aspects of digital preservation, and thought must be given to the efficiency of creating metadata conformant to the application profile and the ease of training staff in its use.
Extensibility
Can the application profile be easily evolved and extended to accommodate new circumstances?
Timing
Preservation metadata should be assembled and given structure at ingest to a repository, and may be added to over time in response to preservation actions, availability of new tools, or the transfer of rights, etc. Information about the environment and the intellectual property rights relating to materials is best-obtained from the creators of the material, while other metadata will be compiled via repository processes, using validation tools, fixity checkers, virus checkers, forensic software and metadata extraction tools. Structural information may be evident in the material as transferred, but may need to be extracted and formally recorded.