Workbook on Digital Private Papers > Digital preservation strategies > Selecting the right preservation strategy

Selecting the right preservation strategy

Migration

Migration is the preservation approach which has been most widely practised to date. At its simplest it is defined as the copying or conversion of digital objects from one technology to another, whilst preserving their significant properties. Migration focuses on the digital object itself, rather than its environment; it aims to change the object in such a way that hardware and software developments will not affect its accessibility. It therefore applies to:

Whilst the OAIS Model defines refreshing as a form of migration, refreshing is essentially a means of mitigating media degradation and obsolescence, whereas full migration is also intended to overcome obsolescence of the encoding and format of the data as well. Migration (as it is generally known) is called 'Transformation' by OAIS, which defines it as changing the Content Information (the Content Data Object and its Representation Information) while attempting to preserve the full information content. When an Archival Information Package (AIP) is migrated in this way, a new 'version' of the AIP is said to have been created as a replacement for the original; both AIPs can be preserved, but the migrated version will be considered the primary package for preservation, although future developments may still result in a new migration from the original AIP. Within OAIS, the Preservation Planning function is responsible for developing migration plans, whilst performing the migration is the responsibility of the Administration function.

It is important that migration is fully documented by metadata, and ideally it should also be reversible: the only way to guarantee that no information will be lost on migration is to carry out a backwards migration, which should result in an exact recreation of the original object. In reality, however, some degree of 'acceptable loss' may be an inevitable result of migration; digital curators must strike a balance between achieving a perfect reversible migration and maintaining an accessible version of the digital object which is as close to the original as possible in all essential respects, but which may have undergone some subtle changes during migration. A repository might choose to define what constitutes acceptable loss for each object type as part of its content model. These are two reasons why the preservation of bitstreams is so important: if migration is unsuccessful, the repository will always have backup copies of original bitstreams to fall back on; and if some degree of acceptable loss has occurred to a digital object during migration, the migrated version can be maintained as the principal access version, while preserving its bitstream allows for the possibility of undertaking a more successful migration in the future if preservation techniques have advanced.

Migration is a very diverse field and many variations on the general approach have been considered in the digital preservation community. Four of these are discussed below:

Backwards compatibility

Simple version migration is common in the world of commercial software and has been in use for years. Successive versions of particular proprietary file formats will define linear migration paths for files stored in those formats. Software vendors usually supply conversion routines that enable newer versions of their product to read documents created in older versions and then to save in the current version. Where a digital archive includes material stored in recently-created proprietary formats which are well-supported and well-documented, a repository may decide to leave the material in this format until at risk of obsolescence; at this stage, if an upgraded and backwards compatible version of the format has been released, it may be decided to migrate to this format. However, reliance on proprietary formats like this does not provide a long-term solution, and there are major drawbacks from a digital preservation perspective:

It may also be practical to consider the onward migration of open formats to newer versions.

Migrate to standard format(s) on ingest (Normalisation)

In order to control complexity and cost, an institution may decide to support only a limited number of standardised file formats, and migrate all digital objects to an appropriate supported format on ingest. All digital objects of a particular type will be converted into a single chosen file format that is thought to embody the best overall compromise among characteristics like functionality, longevity and preservability, e.g. all raster images might be converted from their original format (such as JPEG or GIF) to Uncompressed Baseline TIFF, and all word-processed documents might be converted to OpenDocumentText (ODT). The institution then undertakes to support this format (or formats) indefinitely. This approach to migration is known as normalisation. Where it is impossible to normalise a digital object (e.g. if it was created in an obscure format), the bitstream should be preserved; if and when a tool to normalise this format is developed, it too would be subject to the normalisation process. Normalisation is seen as a more cost effective option than migrating to a wider range of formats; if using a single format like XML, repeated cyclical migration into different formats (with its accompanying risk of data loss or corruption) is avoided.

An example of normalisation which supports a limited number of preferred formats is the practice of the Public Record Office of the State of Victoria in Australia, where:

Similarly, a repository may limit this approach to just one preservation-friendly format, converting all digital objects to this format on ingest. The National Archives of Australia takes this approach: it accepts digital records in any format and converts them all into an XML-based archival format, using the normalising tool, XENA. The XML object is the preservation master, and this is then transformed into an accessible format for users.

If normalising all digital objects to one or more standard formats, it should also be borne in mind that even open standards evolve, which means that subsequent migrations may be necessary. Even if using a single format like XML, different XML schemas will be needed for each object type; this means that if migration becomes necessary, each schema will require a different migration pathway.

Migrate to newer or standard file formats on obsolescence

An institution may leave digital objects in their original formats and rely on their technology watch facility to identify when each format is at risk of obsolescence. For example, when a new version of software which cannot read files created in earlier versions of the software is released, all affected files are migrated to a different format. A number of migration options exist, including:

Migration on request

An institution may choose to leave digital objects in their original formats, or preserve only the bitstream, until a user (e.g. a cataloguing archivist or reader) requests access to them, at which point they will be migrated to a preferred format.

This approach was advocated by the Curl Exemplars in Digital Archives (Cedars) and the Creative Archiving at Michigan and Leeds: Emulating the Old on the New (CAMiLEON) projects. It involves preserving the bitstream of a digital object (Cedars uses the term 'byte stream', a byte consisting of eight bits) and developing a Migration on Request Tool which is able to reproduce the intellectual content of the digital object in a different format. The tool should be developed when the format is still in a usable form and its performance should be compared against a rendering of the original object; any future modifications which become necessary as technology evolves would require a similar validation process. This means that original software, screenshots and written documentation about the original environment may also need to be preserved alongside the bitstream and the Migration on Request Tool. Migration only occurs when a user requests access to a digital object, rather than taking place on a regular cyclical basis.

This approach has a number of advantages:

Deferring migration may have some disadvantages too:

Some general advantages and disadvantages of migration

Advantages

  • It is a widely used strategy and procedures for simple migration are well established.
  • It is generally a reliable way to preserve the intellectual content of digital objects and is particularly suited to page-based documents.
  • Conversion software for some formats is readily available.

Disadvantages

  • It requires a large commitment of resources, both initially and over time. Migration at the point of obsolescence is labour intensive unless it can be automated, because formats evolve so rapidly; as collections grow, the work involved in migration also increases. The migration on request approach may mitigate this to some extent, in that migration is not carried out on digital objects which may not be used; standardisation of formats also makes batch migration easier.
  • Some of the data or attributes (e.g. formatting) of the digital object may be lost during migration; the authenticity of the record may then be compromised. In particular, there is likely to be a significant loss of functionality in the case of complex digital objects. Migration is based on the assumption that content is more important than functionality or look and feel.
  • The potential loss of data and attributes may compromise the integrity and authenticity of a digital object, which is a major issue for digital archivists.
  • There may be potential IPR problems if either the source or the new format is proprietary, although these are unlikely to be as prohibitive as they might be in the case of emulation. It is unclear yet whether the Gowers Review, published in December 2006, will mitigate the problem of IPR: Recommendation 10b of this report states that by 2008 libraries in the UK should be enabled to format shift archival copies to ensure that records do not become obsolete.
  • Specialised conversion tools are needed to convert digital objects from one format to another, and if no appropriate tool is available for a specific file format, developing a customised migration system can be complex and expensive, although costs could be shared with institutions wishing to perform the same migration.