Workbook on Digital Private Papers > Arranging and cataloguing digital and hybrid archives > Arranging and cataloguing emails

Arranging and cataloguing emails

At what level should emails be catalogued in EAD?

As the above examples indicate, there are many possibilities for the intellectual arrangement of emails within a personal archive. Similarly, there are many options for cataloguing and access. When making these decisions, the wide range of uses to which researchers might put such material in future should be borne in mind. To date, very few email archives are publicly available, so there is no tradition of past practice to base decisions on. One notable exception to this is the Enron dataset: this dataset comprises some 0.5 million email messages, generated by around 150 users, and was obtained by investigators during the Enron accounting fraud scandal in 2001. The Federal Energy Regulatory Commission was charged with investigating the company - which involved reviewing emails along with other data; the email dataset was subsequently made available on the Web. Unfortunately its archival integrity has been lost as the result of a ‘cleaning’ process (which included the removal of attachments, the deletion of some messages and conversion of email addresses). However, it has already been used as a research resource and the work undertaken offers an insight into how researchers might make use of archival email directories for purposes beyond the more obvious biographical or historical research; for instance, a social-network analysis of the data has been carried out; it has also been used as source material for a number of email visualisation experiments, natural language processing investigations and the subject of research examining methods for the automatic categorisation of email into folders (something which might be useful for the management of email in future).

As this research indicates, there is a good argument for giving researchers unmediated access to an archival email directory, which they would encounter in the same way as the creator would have entered and viewed it. Experiencing email in the same way as it was used is, as Maureen Pennock points out, an important historical and social experience, although changes in tacit knowledge may make this difficult for us to provide and for users to navigate in 100 years time. Enabling researchers to display or reorder the data in multiple ways (using the METS structural map) could also facilitate different types of research.

Paradigm believes that some degree of mediation between researcher and archival email directory is necessary for a full understanding of the material. The EAD catalogue and the metadata included in the DIP will also contain additional information which would not necessarily be obvious simply from viewing the email directly. Paradigm recommends creating a high-level description (at series or sub-series level, depending on the overall arrangement of the archive) for the entire directory.