Workbook on Digital Private Papers > Digital preservation strategies > Preservation strategies for personal digital archives
Preservation strategies for personal digital archives
The digital archive material acquired from working politicians and used as Paradigm's primary testbed is comparatively small in quantity, yet contains a very wide range of file formats. During the first 10 months of the project alone, the material accessioned included 20 different formats, and this is only material which was created during the last five years. Forms encountered include email, word-processed documents, spreadsheets, digital images, presentations, personal web pages and blogs. Paradigm also worked with the digital component of the archive of Barbara Castle, which represented three generations of computing technology - hardware, software and formats.
This is likely to be typical of personal digital archives generally. Institutions that collect personal archives can have little control over the formats, software and hardware used by record creators. Curators of personal digital archives have recognised the need to work more closely over long periods with their donors and depositors, and to offer guidance on managing digital material - like the guidelines (see Appendix B: Guidelines for creators of personal archives) created by Paradigm; however, it is likely that collecting institutions will always have to deal with a wide variety of digital material, at least some of which may be obsolete. This means that some level of digital archaeology will be necessary for many years to come.
In such a diverse environment, it is necessary to select very broadly applicable preservation strategies, or to consider implementing a combination of different approaches:
- An institution's preservation strategy should define the range of formats to be supported and include detailed information about how each format will be treated; this will involve identifying significant properties and creating profiles (content models) for different object types, as well as taking into account the different categories of record creator represented in the collections.
- The enormous variety of formats to be dealt with, and the range of creators involved, may mean that normalisation is inappropriate as an overall approach because significant properties are likely to be widely varied and very specific. However, normalisation may be the most suitable way of dealing with digital objects in old or obscure formats; alternatively these might be preserved at bit level only.
- Migration to standard formats on obsolescence may be an appropriate strategy for dealing with objects in well-supported current formats. A comprehensive technology watch programme should always be in place if taking this approach.
- Migration on request might be a useful approach for personal digital archives, which will often be closed for extended periods under the Data Protection Act and copyright restrictions; unnecessary migrations during the closure period would be avoided and reduce the risk of data loss.
- Although not widely tested, emulation offers a promising solution for preserving complex objects and those where maintaining look and feel is a priority; the latter may vary according to the type of record creator concerned.
- Repositories seeking to engage seriously in the preservation of personal digital archives should start collecting hardware, software and manuals for data extraction and digital archaeology.
- Bitstreams should always be preserved intact, and be subject to preservation measures like refreshment, backup and fixity checks. They should be packaged together with appropriate metadata, including Representation Information. This can be achieved by creating a METS document for each digital object which contains or refers to representation information and other types of preservation metadata. Representation information is likely to be extensive for collecting institutions which are used by a wide range of researchers.
- Curators should test available migration and emulation tools on digital material they already hold in order to develop informed preferences and to assist in the development of such tools.
- All preservation actions should be fully documented. Paradigm explored and recommends PREMIS (also stored in a METS document) as a means to record information about any migrations or other preservation activities.
Whatever combination of approaches is selected, the result must be affordable. The strategy developed must embody the best that can be accomplished with the available resources. Collaboration and knowledge-sharing will be vital to establishing a successful long-term preservation strategy for personal digital archives; evolving strategies in-line with the digital preservation community will allow repositories to leverage tools and techniques developed by others. The first wave of digital archivists might also wish to exercise some caution by adopting multiple parallel strategies while the field is still at an early stage of development and tools are quite limited. Information technology continues to evolve, as does the digital preservation community: new projects and testbeds are being created on a regular basis, and shared registries and tools are constantly developing and growing. This environment offers the digital archivist many challenges but also a huge information resource on which to draw.