Workbook on Digital Private Papers > Appendics > Guidelines for creators of personal archives

Guidelines for creators of personal archives

Practical tips

The maintenance of your personal digital archive is an ongoing task. Fortunately, by adopting good practice at the outset, it does not take too much effort to increase the longevity of your digital material! Individuals have a variety of record keeping behaviours, ranging from those who purge to those who hoard, and from those who organise to those who live in a state of semi-chaos. Most of us fall somewhere between these extremes and you will know best what fits your style. Below is a series of practical tips to help you maintain your personal digital archive. It is important to choose solutions which suit you, so adopt - and adapt - these according to your needs. The tips range from making conscious decisions about naming and format when you create a file, to the deletion of low value material in order to free up storage space and make it easier to find what you are looking for. There is also advice about backing up, and administering and caring for your computers; and for safeguarding both your own privacy and that of others. Simple measures like these can have a dramatic impact on the survival and utility of your digital archive. These tips are not intended to be prescriptive. Although some of them may seem time-consuming, in reality most involve small changes to the way that you work that should reap benefits in the years to come by ensuring both that your important digital materials are still accessible for as long as they are useful to you and that you can find them when you need them!

1. Organise and name files appropriately

Most of us accumulate material in a haphazard fashion and do our best to impose some basic order that will help us rediscover items for future needs. Remember that search and discovery tools can only do so much; they cannot tell you why you created a document, or explain unfamiliar abbreviations and acronyms. Making your documents transparent will help you to understand them in the longer-term. Here are a few tips you could try to make it easier to find important materials quickly.

Naming files and folders:

Make your data self-documenting

Delete what's not important.

2. Manage your emails

Email is an integral part of our personal and professional lives and many of us maintain multiple email accounts. An email client is essentially a program that accesses email stored on a server and is used to send and receive email messages. Choosing which email client to opt for is a matter of personal preference; well known examples include Eudora, Mozilla Thunderbird and Outlook Express. There are two main protocols used by email clients accessing mail over the Internet: POP3 (Post Office Protocol) or IMAP (Internet Message Access Protocol).

A POP3 client usually downloads email from the server to your PC and then deletes the email from the server. This means that email messages are more vulnerable to loss, so all the advice on creating backup copies given in Tip: 4 should be followed.

An IMAP client accesses emails on the email server and normally leaves them there rather than deleting them; this means that you usually stay online whilst reading, composing and sending mail. IMAP also allows you to work offline: you can download copies of the messages while the real messages are left on the server. Most modern email clients and servers support both of these standard protocols.

There is a single standard (RFC 2822) for the transmission of email messages; some proprietary email clients convert the standard file into a proprietary database format for storage, which can make it difficult to access the messages if the client becomes unavailable. Email clients that save in the open standard format MBOX include Mozilla Thunderbird, Mulberry and Sylpheed. For a comparison of email clients, see Wikipedia.

In recent years there has been a move towards web-based email services; commercial examples include Gmail and Yahoo! Mail. Webmail can be accessed using only a web browser; no specific email client is required. On the whole, the functionality and flexibility of email clients leave current webmail services a way behind. Not all webmail services provide POP3/IMAP access for downloading email (and some that do charge), meaning that you require a permanent Internet connection to read your email. Choose a service which provides free POP3/IMAP so that you can read email offline and extract it from the service when you need to.

Regardless of the type of email you use, you should manage your email efficiently; often email clients provide inbuilt facilities to help with mailbox organisation. The following list provides some tips on managing your email which should enable you to find the information you need more easily and facilitate future appraisal by a digital archivist:

3. Select suitable formats and software

A key factor in increasing digital longevity is selecting the simplest format available for the purpose. The more complexity in a file, the more dependencies it has and the risk of file corruption increases. It is also important to choose formats which are supported by multiple applications. Avoid using confidential proprietary formats where you can; instead, try and use open standard formats. If the file format specification is openly published, the probability of your documents surviving to be read in future years increases significantly. The ubiquity of some software packages (like Microsoft's Office 2003 suite) makes them all too easy to use, but their specifications are opaque, making it impossible to gain a thorough understanding of how they work. Whilst some developers of proprietary formats publish their specifications (such as Adobe PDF), confidential proprietary formats which can only be read in conjunction with specific software are less dependable than publicly available formats which can be read by multiple applications.

Where your work is of a type that currently has no associated standard format, consider using open source software (OSS), where the program (source) code is available. This provides an effective format specification. OSS is also often linked to licence agreements which make it easy to take preservation measures (such as saving copies of the software or migrating to new formats) without violating the intellectual property claims of the manufacturers.

Other advantages associated with OSS include:

More information is available from OSS Watch, a body which offers guidance on OSS to UK higher and further education institutions.

Some suggested formats that will facilitate long term preservation and access can be found below.

a) Textual documents

Office suite:
For word-processing try using the OASIS Open Document Format (ODF) which is a published ISO standard (ISO/IEC26300) and therefore more 'preservable' once archived. You can use many applications, including OpenOffice.org, to create and save documents in ODF and a plug-in for Microsoft Word which allows the program to open and save documents in ODF format is available.

Databases:
MySQL, PostgreSQL and Firebird are examples of open source databases, which run on a range of platforms, including GNU/Linux, Mac OSX and Windows, and compare favourably to their proprietary equivalents, particularly in terms of speed and stability. Many desktop applications, including OpenOffice.org can be used to access these database engines.

PDF/A:
PDF is a file format widely used for presentation copies of office documents which cannot be edited by those viewing the file. PDF stands for Portable Document Format (PDF) which aims to provide a mechanism for representing electronic documents in a manner that maintains their visual appearance, independent of the tools and systems originally used for creating, storing and rendering the files.

The 'A' in PDF/A stands for 'Archive' and signifies that the format has been confined to basic PDF features to simplify its long term preservation. PDF/A is not a magic bullet for preserving digital records, although its adoption will assist the preservation of PDF files by preventing encryption, digital rights mechanisms and other features which impede preservation. PDF/A was ISO-approved in 2005 (as ISO 19005-1).

b) Raster images

Raster images are made up of thousands of millions of single dots of colour (pixels) arranged in a grid. They are the kinds of images most often created by digital cameras or scanners.

Some cameras use a proprietary 'raw' format; such formats can be manipulated by software provided by the camera company, but may fall foul of software obsolescence over time. You should therefore save high quality master images (min. 300dpi) in TIFF format, which is a well-supported open standard. If sending pictures via email, or adding them to a website, create lesser quality, but more easily transportable 'throwaway' versions in JPEG format.

c) Email

Think carefully about the service your web-based email account provides; make sure that it is easy to download your email (in bulk and preferably retaining any filing structure you put in place) if you need to. If you intend to place your correspondence in a research institution, you will need to be able to extract it from the email client you download to. Use a client which can save your email in an openly documented format such as MBOX.

d) Websites and weblogs

Comply with W3C recommendations and make sure your (X)HTML, CSS etc. is valid. Select open standard formats for images, audio and video, etc.

e) Operating systems

Your operating system is the central software program within your computer system; it manages all the other programs (known as applications). Apple OS, GNU/Linux and Microsoft Windows are mature, stable and usable platforms.

4. Backup your files

Hard disks fail. It is not a case of 'if' but 'when'. Your software and files could also be lost as a result of flood, fire or power surge; theft is another risk, especially if you use a laptop. This means that regular backing up of records is an essential task. Consider:

a) Making copies on portable media

An external solid state drive is a good option; these are easy to use, inexpensive and provide sufficient capacity for many people. You could use CD-Rs or DVD-Rs, but this will be more time consuming and may result in splitting data over several disks. You should update your removable media as needed; old media can degrade or become inaccessible as technology evolves.

b) Storing a copy off-site

For key files you could make additional copies and store them elsewhere, perhaps with a friend or relative or in a deposit box. If you have a broadband connection, there are many online services which allow you to upload your files, a form of off-site storage. Always read the terms and conditions carefully and check the following:

c) Using data synchronisation services or software

If you use several different computers, you probably encounter the problem of not having the right file in the right place from time to time. Data synchronisation services can solve this problem by replicating changes to data on one of your computers to all the others you use. This means that you always have access to your data, and you always have one or more backup copies of your data. The difference between this type of service and online backup services is that your data is not permanently held on third party servers, but is encrypted and sent to your other devices for storage. Again, you will need to check the terms and conditions of services carefully.

The first time you synchronise your data, it is likely to take a great deal of time. After this first sync, only the differences in the data are synchronised, which makes synchronising much quicker.

Read the small print and check the following in relation to services:

Deciding what to backup

Files which would be difficult or impossible to recreate are the most important to backup. For example:

Make a list of what you intend to backup and decide on a backup routine (how often you will backup which files and to what). If you keep a record of your backup and label the media to which you backup, then restoring your system from your backed up data will be much easier. If you delete material that you don't want, and organise files logically, this will also simplify backup.

For added security, you may wish to consider encrypting your backup data. See Tip 7 for more information on encryption.

Backing up your website

Ideally copies of personal websites should be made prior to major changes and updates. If you are the webmaster of your own site and have access to the files which make up the website, then simply create archived versions of your website by compressing these files into a tar or zip file, using a naming convention such as website20060819.zip; store these in a folder with other archives of your website.

If you create your website or weblog using a service, perhaps the easiest method of archiving it would be to capture a copy of the website using Adobe Acrobat Professional, or HTTrack. Alternatively, you could suggest that the UK Web Archiving Consortium archive your website by completing their submission form at http://info.webarchive.org.uk/cgi-bin/submission.cgi, or submit your website to the Internet Archive at http://www.archive.org/web/web.php (you must register to do this).

If your website is database driven, preserving it is more complex, as a copy of the database needs to be captured with the website backup. Speak to an archivist, who will be able to advise you.

5. Look after your hardware and media

Accept that your PC and storage media will fail. Organisations replace PCs, servers and storage media on a cyclical basis; you should too. Replace them before they break and avoid losing your data; five years is currently a reasonable life expectancy for hardware and media.

Minimise the failure of computer components and storage media by keeping them clean and preventing them from overheating. Vacuum dust that collects on your computer equipment and keep air vents clear. If your computer's fan is struggling to cope with the heat, then shut down the computer and postpone unnecessary work. Make sure that hardware and media are stored in stable environmental conditions, and not precariously perched on unsuitable desks or shelving. To avoid data corruption, USB keys and other mass storage devices such as MP3 players and digital cameras should be correctly removed from hardware according to the needs of the operating system. CDs and DVDs should be treated with care to prevent damage.

Invest in a small Uninterruptible Power Supply (UPS). A UPS protects from power surges. The better ones also smooth voltage peaks and troughs, and in the event of a power outage will continue to supply power to your PC allowing it to be shut down gracefully, and avoiding data corruption.

6. Administer your system

Before undertaking major updates of hardware and software take some time to think and plan your actions. It is common for older files to get lost as a result of updates. Files should be backed up elsewhere prior to hardware and software changes.

Be security aware. Unless you are certain your operating system doesn't need it, then anti-virus software and a firewall should be installed and regularly updated. It is important that you do not open suspicious emails or attachments. You should also be aware of the various kinds of 'badware' (like spyware, malware and deceptive adware) which can affect your system. If you are plagued by pop-up ads when online, it is likely that you have badware on your computer; badware can ultimately cause your system to crash and can also lead to the abuse of your personal information. You may be unaware that you have downloaded badware, which can be loaded onto your system when you visit certain websites; sometimes it is even included in proprietary software packages, with no acknowledgment of this on the part of the manufacturer.

Useful information is supplied by the Stop Badware Coalition, a 'neighbourhood watch' campaign aimed at fighting badware.

7. Consider using passwords and encryption devices

If you keep valuable and personal data on portable media, PDAs, or laptops it might be appropriate to consider encryption as a means of keeping data safe; imagine if your laptop were stolen, what kind of valuable, private or even embarrassing information it might contain.

There are many different kinds of software that can encrypt files, folders, emails, attachments, portable storage devices and more. It is best to select open-source encryption software to ensure continued availability. One example of open source (free to use) disk encryption software for Windows XP/2000/2003 and GNU/Linux is TrueCrypt.

If you habitually use encryption, it is important that you remember your passwords; a lost password can render your data inaccessible. One solution is offered by password managers. Software like KeePass enables you to manage your passwords in a secure way. You can put all your passwords in one database which is locked with a single master key or key-disk, so you only have to remember one master password or insert the key-disk to unlock the whole database.

It will become increasingly important for those with valuable password protected or encrypted digital assets to make provision for their access in the event of the creator's death. Relevant details should be stored offsite in a secure location, perhaps lodged with a solicitor or kept in a deposit box known only to likely executors.

8. Be aware of intellectual property rights and privacy

Copyright and licences

In addition to the documents you have created yourself, your own digital archive will inevitably include material (reports, articles, images, music, etc.) created by others. This material (whether officially 'published' or not) will usually be subject to copyright legislation, so you should be careful about how you use it; copying, editing or forwarding copyright material could be a breach of copyright law. Where you do hold digital material which was created by other people, it is useful to be aware of who authored particular documents and when they were created. This information will help the future curators of your archive in determining the copyright status of the material. When you donate or deposit your archive with an institution, the digital archivist is also likely to ask you for permission to make multiple copies of the digital material in which you hold the copyright, for preservation purposes.

Copyright applies to your unpublished written works for 70 years after your death. If you want to encourage greater usage of your digital archive, you could consider applying less restrictive licences to your material, using licences such as those developed by Creative Commons.

Digital Rights Management

It may be that documents which have come into your possession are covered by proprietary Digital Rights Management (DRM). Windows DRM, for example, can be applied to word processors, email clients and other applications; it enables you to choose from a variety of usage rights to define who can open, modify, print, forward, or take other actions with the information. These usage rights are locked within the document itself, controlling how information is used even after it has been opened by intended recipients. Similar digital rights management functions can be attached to Adobe Acrobat products - allowing publishers to control the opening, display and use of files. Other packages also come with such rights management facilities.

One problem with DRM systems is that they are not time limited in the way that copyright law is, meaning that even when the copyright in a particular digital work has expired, there is currently no easy mechanism to remove the copy control systems embedded in the work. This is a major problem for archivists of digital material because undertaking proper digital preservation measures requires them to copy or migrate it into different formats. It is also a barrier to future access by researchers; once material is in an archive, even if still in copyright, it can ordinarily be copied for researchers under the 'fair dealing' provisions of copyright law, but digital rights management systems may prevent this. Avoid using DRM unless it is absolutely necessary.

Privacy

Your digital archive is likely to contain personal data about hundreds of other individuals. Many countries have legislation like the UK's Data Protection Act (DPA) which restricts what can be done with personal data about living people. Although the DPA does not apply to personal archives while they remain in the creator's possession, it comes into force once custody is transferred to a public repository. You might give some thought to the kind of personal data in your archive, and try and ensure that you act fairly and responsibly in your treatment of other people's data. It is helpful for digital archivists to know which sections of your archive are most likely to contain sensitive information, so they can determine how this material is managed in the repository, and how and when it might be made accessible to researchers in the future.

9. Keep up to date

Technological changes are rapid and new technologies are constantly appearing. Interoperability with others and the threat of hardware and software obsolescence mean that you must constantly evolve your digital environment, but do think critically about the impact of these new developments before signing up. What effect will they have on your ability to use your personal archive now and in the future? Will you need to change some of your working practices to safeguard your digital materials from new threats? Keep in touch with your digital archivist who will be able to help you assess new technologies, and offer advice about file format, media, software and hardware migration.

10. Handling legacy digital files

You might find yourself dealing with legacy digital files at one time or another - perhaps those of a family member, a predecessor at work, or your own materials, long-abandoned after a software upgrade. These can be in older, unfamiliar, formats, or it may simply be difficult to evaluate their content.

Dealing with older hardware/media

Best to avoid this problem by maintaining your archive! Older hardware and media can be fragile; you may have a limited number of attempts in which you can read and copy data to new media because the process of reading the disk could cause further deterioration. It can be difficult to obtain the necessary drives (in working order) or cables to transfer data. If you do have digital files stuck on older hardware and media, talk to your digital archivist who may be able to help.

Dealing with unfamiliar formats

You need to be able to read and retrieve information from any legacy records you inherit. Your first task should be to identify what formats you are dealing with. There are online registries you can search, to identify formats and applications that can read them. Some formats may be so outdated that alternative operating systems and hardware are also required. Talk to your digital archivist who can provide information on what is needed to access the files and how they may be migrated to more appropriate formats.

Appraising legacy files

When revisiting files you worked with a long time ago, or browsing those of another, it can be tricky to tell what is worth keeping at first glance. You should conduct an initial survey of the material to form an overall impression. This doesn't necessarily entail opening and reading every record; an initial assessment might involve opening a few files in each folder to assess whether the folder title accurately reflects its contents, and an assessment of the likely significance of the material. File names, dates, author and correspondent names can be useful clues. This survey should help you to identify low value material which can definitely be deleted, material which may have short- or medium-term use, and material with long-term personal or research value.

11. Ask digital archivists for advice

Don't just rely on generic guidelines; seek advice from digital archivists and think about where you might deposit your archive in the future!