Introduction to preservation metadata

This section of the chapter introduces some of the thinking behind preservation metadata for digital materials and applies this to a repository designing a preservation metadata application profile for personal digital archives. It begins with a general overview of preservation metadata issues, highlights some notable thinking in the evolution of the subject, provides an in-depth examination of the use of the PREMIS Data Dictionary (v 1.0) as a means of structuring generic preservation metadata and supplies information about schemes designed to meet the technical needs of distinct categories of digital material, such as still images and audio files.

Archivists must create, manage and use preservation metadata in order to administer and maintain access to authentic digital archives, their context and provenance over the long-term. The mechanisms needed to realise these concepts must change in the digital environment, where archives are both technology dependent, interdependent and easily mutable. Paradigm assumes that a repository with a remit to preserve personal digital archives for historical research means to preserve the integrity of the Intellectual Entities in the digital archive and their inter-relationships, not simply an arbitrary collection of digital files and folders placed at the repository.

Intellectual Entities are the conceptual items that will be described by archivists and accessed by researchers. An example of an Intellectual Entity is 'the personal website of politician X, 26 January 2007'. This website consists of a precisely arranged series of interrelated files and folders, which together produce a Representation of the website. It would be possible to preserve the files that compose the website without preserving their relationships, but doing so would make it near impossible to recreate a Representation of the Intellectual Entity (the website) for researcher access. To preserve meaningful access to digital archives, we must therefore do more than preserve files.

In the digital environment, Intellectual Entities may acquire several Representations over time as a result of preservation actions: if a file format migration is adopted as a preservation strategy, then each time a File belonging to a Representation is migrated, a new Representation of the Intellectual Entity to which it belonged is created (see Adding persistent identifiers: when to identify? and Migration).

This illustration shows the relationships between an Intellectual Entity, its Representations and the Files that belong to those Representations:

Figure 9: Intellectual Entity, Representations and Files

It is an authentic Representation of the personal website (Intellectual Entity) and its relationship to other Intellectual Entities in the personal archive that the repository wishes to preserve. In order to preserve the personal website, its constituent files must be preserved and metadata about their structure must be created and stored at the Representation level. If one or more of the group of files that constitutes the original Representation must be format-shifted in order to preserve access to the Intellectual Entity, then a new Representation of the same Intellectual Entity is created. In this example, the repository does not support the DJVU image format and has elected to migrate this File to JPEG2000 format, thus creating a new Representation of the personal website. In this way, the ability to reconstruct the Intellectual Entity (and the relationships between the original and subsequent Representations of the Intellectual Entity) is maintained as technology changes. The relationship between the personal website and the other Intellectual Entities in the personal archive is not illustrated here, but must be captured in a METS structure map detailing the intellectual arrangement of the archive and held with metadata applicable to the collection level.

This model of Intellectual Entities, Representations and Files derives from the PREMIS Data Dictionary for Preservation Metadata 1.0, which will be described in more detail below.