Workbook on Digital Private Papers > Appraisal and disposal > Appraisal-related issues encountered by Paradigm

Appraisal-related issues encountered by Paradigm

Challenges associated with appraising digital archives

These are a few of the scenarios that Paradigm has faced in appraising its exemplar hybrid personal archives:

Managing large quantities of digital material

The key issue when appraising digital records, whether they be personal or organisational, is coping with the sheer quantity of information. It would be impossible to read and appraise the digital records of a twenty-first century politician document by document. Digital material can be copied, edited, and circulated rapidly. Such functionality inevitably encourages the proliferation of records.

Multiple copies of the same document stored by different people

Paradigm archivists often found that the same digital document was held by more than one of the politicians' staff members. In this scenario, the archivist must decide which version should be retained.

Large quantities of circulars

Mailboxes often contained a large number of circular-style emails, frequently with attachments, from the central office of a political party. In a paper archive such circulars are unlikely to have been kept but in a digital world there is an argument for keeping such working papers, which illustrate how the constituency MP fits into the broader political picture.

Multiple copies of the same document created during different snapshots

One of the key issues facing digital curators is how to deal with duplicate files especially if digital snapshots are to be taken several times a year throughout a creator's working life. If such patterns of acquisition were adopted, there would be vast amounts of duplicate files: an overwhelming digital abundance. It will be necessary to use checksum or synchronisation tools to determine whether files with identical names are indeed identical.

System files present in acquisitions of digital records

The archivist will also want to identify and remove operating system, application and other software files (unless using the encapsulation approach to preservation). The National Software Reference Library can assist with this; the NSRL provides a repository of known software, file profiles, and file signatures for use by those engaged in computer forensics.

Processing required prior to appraising digital records

Records accessioned on older media and/or in older formats will need to be extracted in order to undertake an assessment of their content, context, structure and technical viability. This can be challenging and may add significant costs to the processing of archives before appraisal even begins.

Authenticity of author metadata in digital records

Dates

During both the appraisal and cataloguing process, decisions will be made based on the date on which a document was created, for example an urgent memo penned on the eve of a General Election is likely to have a greater significance than a circular written during parliamentary recess. Paradigm found that the concept of 'date' in a digital world is riddled with complications. The capture process itself can alter the perceived creation date. In practice the modified date is often the best indicator, as this shows when the file was last modified, while the creation date may only be that of saving the file to a new location. Further complications are introduced in office files where the creator has ticked the 'Update automatically' box when inserting a date and time field in the actual body of the text. This means that whenever the document is viewed the current date will be given. Paradigm encountered this issue in the press release documents generated by participants.

Authorship

Similarly, Paradigm found that the author metadata associated with many kinds of files is often inaccurate. This metadata can be particularly misleading in records acquired from under-resourced offices where staff routinely share computers. The poor quality of metadata also frustrates efforts to identify copyright holders in the digital archive; the process of circulating work can also lead to the decoupling of author and work. Establishing intellectual property rights will be a key concern for the digital curator who will need to determine who took a photograph or authored an article, whether they are still alive, whether they still hold copyright and how long that copyright will last.

Authenticity

Authenticity is another key consideration. When appraising records the digital curator must establish that they are indeed what they purport to be. If authorship of records is in doubt, the value of the records diminish and they may be unsuitable for long-term preservation as archives. Most creators of office documents do not take pains to make sure that they are automatically described as the author by their software package and do not add their name in the text of a document or incorporate a digital signature. Authors often rely on the means of circulating the work, such as an email, to assert their authorship, but if the means of circulation and the document are disconnected, the provenance is undermined. Microsoft Word does keep a 'revision log' which provides some additional metadata, but accessing this is not as straightforward as accessing the file properties dialogue. The native metadata associated with emails is much more reliable and accessible as the email header provides a great deal of information, which is accurate and complete in most cases. Establishing the authenticity of archives repositories wish to purchase will become increasingly important and may be complex if material includes examples of older or obscure technology.

Relevance of format to appraising digital records

The format of a document may assist the digital curator in assessing the purpose of a record and thus its value. For example, a document saved in PDF implies that it is the final version of a report, which has been passed as fit for public consumption. By choosing to save in PDF the author of the document is consciously preventing further copying or editing of a document and in this respect maintaining its authenticity. Similarly, a snapshot of a politician's personal website acquires a final and authoritative version of the site as it stood at the moment of capture. Arguably, researchers may be more interested in the less sanitised picture - the content rejected for the public arena or the drafts that led to the final versions. Formats used for the drafting process, such as word-processing formats, may therefore embody high value archival materials.

Paradigm's Academic Advisory Board identified email as one of the most interesting types of historical record being created in our times. It contains records of business transactions (that might have been undertaken via an exchange of correspondence on headed notepaper previously) as well as informal exchanges (previously the stuff of telephone conversations). The variety of functions served by the email format would suggest that making appraisal decisions on the basis of format alone is insufficient.