Workbook on Digital Private Papers > Digital preservation strategies > File formats

File formats

Representation Information

Whilst information about a digital object's file format is essential for its preservation, more than this is needed to ensure that the bitstream can be transformed into something which is meaningful and understandable over time. Elements like operating system and hardware dependencies, character encoding, algorithms, standards and so on should also be taken into account. The OAIS Model uses the term Representation Information to define this kind of information. Representation Information is subdivided into three classes:

In an OAIS information package, the Content Information (i.e. information about the digital object which is the target of preservation) is comprised of:

A digital repository should retain persistent Representation Information along with the data objects it preserves, or it should refer to Representation Information held externally in a reliable repository. Representation Information may need to be interpreted using further Representation Information in order to make it intelligible, e.g. it may be stated that the digital object to be preserved conforms to the ASCII standard; this standard in turn then needs to be explained. The recursive nature of Representation Information results in a complex and extensive network of representation objects, which continues expanding until the contents of the original digital object are displayed in a form the user can understand. The user in this case is a member of the repository's Designated Community (or primary user base). If this user base is small and specialised, only a minimum amount of Representation Information may be necessary. However, a repository must consider future developments and decide whether or not to maintain a larger amount of Representation Information which would render its holdings understandable to a wider community with a less specialised knowledge base. The latter is the more appropriate approach for a collecting institution which takes in personal archives; this means an extensive quantity of Representation Information is likely to be necessary.

The Digital Curation Centre has recognised that a collaborative model for creating, storing, maintaining, accessing and using Representation Information is necessary to assist the development of long-term digital curation strategies. The Centre is therefore developing a distributed Representation Information Registry/Repository to provide an infrastructure for the preservation of Representation Information. The DCC will not fully populate the registry itself, so the community will only derive benefit from the registry if its members invest time and effort in populating the resource. It is intended that the registry will include:

Other Representation Information to support both migration and emulation preservation strategies will also be held, such as details of software with appropriate emulation capabilities. Digital repositories will be able to refer to Representation Information held in the registry by means of a Representation Information label (in the form of an XML Schema) which can be attached to a digital object.