Workbook on Digital Private Papers > Digital preservation strategies > Selecting the right preservation strategy

Selecting the right preservation strategy

Emulation

In contrast to migration, which focuses on the digital object itself, emulation focuses on the technological environment in which the object was created. It involves preserving the bitstream of the object and creating an access version by using current technology to mimic some or all of the environment in which the original was rendered. This involves emulating any of the following:

As with migration, there are various different approaches to emulation, although many of these have not yet been widely tested in a preservation context. However, emulation itself is not new; it is a well-accepted technique that has a history of use in computer science; for instance, emulators are often developed by manufacturers to try out a design before it is produced. This involves the emulation of one computer that runs on another computer manufactured by the same vendor, but a number of emulators have also been produced to emulate one type of computer on another, e.g. emulators which run the Macintosh Operating System under Microsoft Windows on Intel-based machines and vice versa. Emulation is also widely used in computer games, and many emulators can be found on the Internet.

Emulation has great potential for long-term digital preservation. Where significant properties include things like functionality or look and feel, emulation may be a better approach than migration, which cannot guarantee to preserve functionality through changes of format. This will be most relevant to complex digital objects like websites, but in some cases the user may wish to experience the look and feel of the environment in which relatively simple objects were created, such as literary works written in a word processor. To give a truly authentic experience of the creating environment, emulation must extend to specifics like execution speed, display resolution, colour and any input devices like a keyboard or mouse, although it is debatable whether some users would want to limit the usability of an archive in this way.

Working with emulators might involve a considerable learning curve for the researcher, who may have to master the conventions of a completely unfamiliar technological environment in order to experience the object in its 'original' form. The Dutch Digital Preservation Testbed Project argued that this has its equivalent in traditional archives, where a researcher may have to travel to an archive and learn to decipher unfamiliar scripts and dialects in order to work from the 'original' authentic document. It also suggested that access for less specialist researchers could be made easier by the provision of comprehensive 'help' systems, or by creating 'vernacular renditions' of the original; these are the equivalent of traditional surrogates like photocopies or transcripts and would enable the user to view the original record in a modern, understandable form.

The same project advocates the first of the three emulation approaches considered here.

Software emulating hardware

This approach is aimed at enabling the technology of the future to emulate the original computer on which the creating software ran. In order to achieve this, three elements are preserved:

The first two of these are preserved in the form of bitstreams. An emulator program is written to preserve the third as another bitstream; this should be written while the original computer is still extant, so that the emulated hardware platform can be validated against the computer it emulates.

There are various options to ensure that future computers can run the emulator program, e.g. 'chaining': an emulator of any one computer is able to run indefinitely once it has been implemented on one other, successor, computer; it may subsequently become necessary to emulate the successor computer in order to continue access to the original emulated hardware, resulting in a chain of emulators.

Virtual machine approach

Another variation of emulation which has been widely discussed in the digital preservation community is based on the concept of a virtual machine. Whereas a traditional emulator mimics an earlier machine which actually existed, a virtual machine emulates a computer that has never actually existed as hardware. Programs are written to run on the virtual machine rather than on a specific computer; if the virtual machine is then implemented on many different computers, all of the programs written for the virtual machine can run on any of those computers.

The Java Platform is a widely used example of a virtual machine: Java programs can be written to run on the Java Platform, which can be hosted on many different real computers and run identically on all of them. While the Java virtual machine is unsuitable for long-term preservation (it evolves rapidly, meaning that it is relatively unstable, and its language is specific to Java), it would be possible to produce an 'Emulation Virtual Machine' for digital preservation purposes. If such a machine could be defined, it would then become the virtual platform on which all emulators are written to run. As each computer in a given generation reaches obsolescence, an emulator of that computer is written to run on the virtual machine.

Universal Virtual Computer (UVC)

This is a variant on the virtual machine approach, which has been developed by R.A. Lorie of the IBM Research Centre in Almaden, USA in 2004, in conjunction with the Koninklijke Bibliotheek in the Netherlands. It involves preserving the bitstream of a digital object along with a specially written emulation program. This program is designed to run on future computers and to emulate the computer on which the digital object was created; it is written in the simple machine language of a platform-independent UVC, and a future computer would need a UVC Interpreter to read and execute the program. Once this is done, the original bitstream could be accessed - by means of the UVC - on any future computer.

In theory, UVC programs can be written for each file format. The UVC Interpreter deciphers each program into a Logical Data View, possibly an XML-like structure, which describes in detail how the digital object is structured, e.g. raster-based images are described pixel by pixel. This is then translated into an understandable representation for the user. This latter stage obviously involves an element of migration, so the UVC essentially combines both emulation and migration approaches.

The UVC method has been successfully applied to images stored in GIF, and the Koninklijke Bibliotheek hopes to extend this to TIFF and PDF. The UVC must support a much wider range of formats if it is to become a central plank of a preservation strategy.

Some general advantages and disadvantages of emulation

Advantages

  • In theory full emulation enables us to recreate the full functionality and exact look and feel of a digital object's performance. It is therefore an attractive approach for preserving complex digital objects and those where appearance or functionality are identified as significant properties.
  • In contrast to migration, the focus of emulation is on changing the environment rather than the digital object itself, thus lessening the risk of data loss through repeated migration cycles.
  • Oltmans and Kol have concluded that emulation is more cost-effective for preserving large collections, despite the relatively high initial costs for developing an emulation device; in contrast, migration applies to all the objects in a collection repetitively, creating high ongoing costs. However, the need for chaining emulators in the future may detract from this.
  • The emulation approach can be implemented at a higher level than the migration approach, so rather than developing conversion solutions per format institutions can develop emulation solutions per environment.
  • It means that records in obscure formats do not have to be abandoned; in theory if the creating hardware/software can be emulated, all the records created in that environment can be recreated.
  • Regardless of the principal preservation approach adopted by a digital repository, emulation could be useful as a backup mechanism that would provide access to the 'digital original' form of each record and may be necessary for the extraction of digital objects from older technological environments.

Disadvantages

  • As yet, emulation has not been widely tested as a long-term digital preservation strategy, and further practical tests are essential before more definitive conclusions about its reliability can be drawn.
  • An emulation system may require the user to master completely unfamiliar technology in order to understand an archival digital record, and technological developments are incredibly rapid; for instance, many have already forgotten how to use relatively recent word processing programs like Wordstar. This problem could potentially be addressed by developing different means or levels of access.
  • Selecting an emulation strategy also involves buying into a migration strategy because emulators themselves become obsolete, so it becomes necessary to replace the old emulator with a new one, or to create a new emulator that allows the old emulator to work on new platforms.
  • Most emulation approaches will involve preserving or emulating proprietary software which is covered by patent, licence or other IPR. This is a major issue and must be addressed by any institution introducing an emulation strategy; it is unclear yet whether the Gowers Review will alter this situation.
  • The concept of 'exact original look and feel' is itself debatable; can it therefore be preserved by emulation? Digital objects are so dependent on the environment used to render them; for instance, a user's experience of a website can differ according to what software and hardware they are using.
  • Emulation may require a large commitment in resources, and highly skilled computer programmers would be needed to write the emulator code.
  • If the UVC approach is used, large numbers of decoder programs will be necessary to cope with the variety of file formats that are available, and it may be that new UVC emulators need to be written for each new generation of hardware.