Workbook on Digital Private Papers > Appendics > Guidelines for creators of personal archives
Guidelines for creators of personal archives
Practical tips
The maintenance of your personal digital archive is an ongoing task. Fortunately, by adopting good practice at the outset, it does not take too much effort to increase the longevity of your digital material! Individuals have a variety of record keeping behaviours, ranging from those who purge to those who hoard, and from those who organise to those who live in a state of semi-chaos. Most of us fall somewhere between these extremes and you will know best what fits your style. Below is a series of practical tips to help you maintain your personal digital archive. It is important to choose solutions which suit you, so adopt - and adapt - these according to your needs. The tips range from making conscious decisions about naming and format when you create a file, to the deletion of low value material in order to free up storage space and make it easier to find what you are looking for. There is also advice about backing up, and administering and caring for your computers; and for safeguarding both your own privacy and that of others. Simple measures like these can have a dramatic impact on the survival and utility of your digital archive. These tips are not intended to be prescriptive. Although some of them may seem time-consuming, in reality most involve small changes to the way that you work that should reap benefits in the years to come by ensuring both that your important digital materials are still accessible for as long as they are useful to you and that you can find them when you need them!
1. Organise and name files appropriately
Most of us accumulate material in a haphazard fashion and do our best to impose some basic order that will help us rediscover items for future needs. Remember that search and discovery tools can only do so much; they cannot tell you why you created a document, or explain unfamiliar abbreviations and acronyms. Making your documents transparent will help you to understand them in the longer-term. Here are a few tips you could try to make it easier to find important materials quickly.
Naming files and folders:
- Be concise: avoid long and complex file paths.
- Select meaningful names: this facilitates searching and browsing.
- Develop standard naming conventions for the file names of record types you create or save on a regular basis.
- Avoid capitals or spaces: this can cause problems when moving files between different computing environments.
- Use the format yyyymmdd (e.g. 10 June 2005 = 20050610) for recording dates: that way your files will be presented chronologically in file management tools.
- Adopt a version control system for drafts (e.g. yyyymmdddocumentname-2.pdf, where -2 denotes that this is 'version 2' of the document). This prevents the embarrassment of sending the wrong version of a document to others. Drafts are also valuable for researchers tracing the creative thought process.
Make your data self-documenting
- File related information together in well-named folders, which give an indication of the subject, project or activity on which they are based.
- Add information to the body of digital documents which explains them to a general audience. This can help you when you re-discover a document too. Simple information, such as a log of authors, a document history and a note about the purpose of a document can be very helpful. You could add a simple table like this to any textual document:
Document Title Caring for your personal archive Author(s) A.N. Other and A. Colleague Purpose Provide guidance to individuals wishing to preserve their digital materials for their own use and for placement at an archival repository. Date 01/02/2007- Filename 20070201sampledoc-2.odt Access Internal General dissemination 
Document History Version Date Comments 1 01/02/2007 Document created by A. N. Other 2 31/03/2007 Revisions to language by A. Colleague - Most desktop applications include a 'properties' option containing file metadata (e.g. title, location, size, dates last accessed and modified). Some of this is generated automatically, but you can record extra metadata yourself, e.g. keywords or a free-text description.
- For images: if your camera or related software allows you to add metadata to your images (i.e. information about when and where a photograph was taken, and what or who it depicts) at point of capture or export, then use this function. This will help you to find photos of particular times, people, places and events. Save photos in purposely created folders rather than randomly in the default 'Pictures' folder and consider saving a simple text file describing the background context of the images (date, place, people, etc.) alongside them.
- When saving documents from the Web for reference purposes, make a note of their source: web addresses can change, and if you want to cite a document in future it is useful to have a record of where you obtained it from and when you last accessed it at that address.
- File email attachments you wish to keep separately rather than leaving them in your email directory. It is useful to identify their source as email attachments in their file name, or even to save a copy of the email alongside the attachment.
Delete what's not important.
- Some materials you'll want to keep forever as part of your archive. Some you might need for a few years. These are the materials that should be maintained. Don't waste time and space on material that has outlived its utility: it will make backing up a chore and make it difficult for you and future users of your archive to extract meaningful information from the resulting mass of data.
2. Manage your emails
Email is an integral part of our personal and professional lives and many of us maintain multiple email accounts. An email client is essentially a program that accesses email stored on a server and is used to send and receive email messages. Choosing which email client to opt for is a matter of personal preference; well known examples include Eudora, Mozilla Thunderbird and Outlook Express. There are two main protocols used by email clients accessing mail over the Internet: POP3 (Post Office Protocol) or IMAP (Internet Message Access Protocol).
A POP3 client usually downloads email from the server to your PC and then deletes the email from the server. This means that email messages are more vulnerable to loss, so all the advice on creating backup copies given in Tip: 4 should be followed.
An IMAP client accesses emails on the email server and normally leaves them there rather than deleting them; this means that you usually stay online whilst reading, composing and sending mail. IMAP also allows you to work offline: you can download copies of the messages while the real messages are left on the server. Most modern email clients and servers support both of these standard protocols.
There is a single standard (RFC 2822) for the transmission of email messages; some proprietary email clients convert the standard file into a proprietary database format for storage, which can make it difficult to access the messages if the client becomes unavailable. Email clients that save in the open standard format MBOX include Mozilla Thunderbird, Mulberry and Sylpheed. For a comparison of email clients, see Wikipedia.
In recent years there has been a move towards web-based email services; commercial examples include Gmail and Yahoo! Mail. Webmail can be accessed using only a web browser; no specific email client is required. On the whole, the functionality and flexibility of email clients leave current webmail services a way behind. Not all webmail services provide POP3/IMAP access for downloading email (and some that do charge), meaning that you require a permanent Internet connection to read your email. Choose a service which provides free POP3/IMAP so that you can read email offline and extract it from the service when you need to.
Regardless of the type of email you use, you should manage your email efficiently; often email clients provide inbuilt facilities to help with mailbox organisation. The following list provides some tips on managing your email which should enable you to find the information you need more easily and facilitate future appraisal by a digital archivist:
- Delete email that has no long-term value as soon as possible.
- Don't ignore your sent mail folder: delete irrelevant mail and move sent messages to relevant folders on a regular basis.
- Retain in your inbox only those messages which you haven't yet responded to, or which relate to ongoing business. File or delete the rest as soon as possible.
- Organise your email into subject folders with concise and relevant titles.
- Give email messages meaningful and concise subject-lines that are relevant to the message content. If the subject of an email exchange changes, change the subject-line.
- Respond to emails in a solid block of text rather than inserting text at various points in the original message. The message might be clear to you, but subsequent changes in the technical format of the email (e.g. as a result of preservation activities) can obliterate the clear distinction between original message and annotations.
- Separate personal and professional email if you can. If you use a work-based email account for personal email correspondence, keep non work-related mail in a 'Personal' folder. Protect your privacy by making it clear what is private.
- If you receive an encrypted email, decrypt it before saving it; this will facilitate access to it in the future when the email directory is archived. Encrypted e-mails might become inaccessible over time as the method of encryption becomes obsolete.
- Include enough information in your email directory to identify the individuals represented. Your email address book or contacts list is a useful tool for recording information about frequent correspondents.
- Use a spam filter and do not open any email or attachment if you suspect it is spam; attachments in particular are often a source of computer viruses.
- Save covering e-mails relating to attachments you wish to keep; this will provide contextual information about the relationship between message and attachment.
3. Select suitable formats and software
A key factor in increasing digital longevity is selecting the simplest format available for the purpose. The more complexity in a file, the more dependencies it has and the risk of file corruption increases. It is also important to choose formats which are supported by multiple applications. Avoid using confidential proprietary formats where you can; instead, try and use open standard formats. If the file format specification is openly published, the probability of your documents surviving to be read in future years increases significantly. The ubiquity of some software packages (like Microsoft's Office 2003 suite) makes them all too easy to use, but their specifications are opaque, making it impossible to gain a thorough understanding of how they work. Whilst some developers of proprietary formats publish their specifications (such as Adobe PDF), confidential proprietary formats which can only be read in conjunction with specific software are less dependable than publicly available formats which can be read by multiple applications.
Where your work is of a type that currently has no associated standard format, consider using open source software (OSS), where the program (source) code is available. This provides an effective format specification. OSS is also often linked to licence agreements which make it easy to take preservation measures (such as saving copies of the software or migrating to new formats) without violating the intellectual property claims of the manufacturers.
Other advantages associated with OSS include:
- Lower acquisition costs.
- Reduced hardware costs: OS operating systems can run effectively on lower specification machines than proprietary equivalents.
- Reduced risk of hardware obsolescence: the transparency of OS operating systems like GNU/Linux means that they can be used in limitless alternative hardware environments.
- Long-term comprehension and re-use of data is made easier.
- Help for data creators is widely available within the open source community.
- Less risk of malicious attacks than proprietary software.
- Opportunity to provide feedback to improve the software.
- Open source technologies often compare favourably with their proprietary peers in terms of performance and reliability.
More information is available from OSS Watch, a body which offers guidance on OSS to UK higher and further education institutions.
Some suggested formats that will facilitate long term preservation and access can be found below.
a) Textual documents
Office suite:
For word-processing try using the OASIS Open Document Format (ODF) which is a published ISO standard (ISO/IEC26300) and therefore more 'preservable' once archived. You can use many applications, including OpenOffice.org, to create and save documents in ODF and a plug-in for Microsoft Word which allows the program to open and save documents in ODF format is available.
Databases:
MySQL, PostgreSQL and Firebird are examples of open source databases, which run on a range of platforms, including GNU/Linux, Mac OSX and Windows, and compare favourably to their proprietary equivalents, particularly in terms of speed and stability. Many desktop applications, including OpenOffice.org can be used to access these database engines.
PDF/A:
PDF is a file format widely used for presentation copies of office documents which cannot be edited by those viewing the file. PDF stands for Portable Document Format (PDF) which aims to provide a mechanism for representing electronic documents in a manner that maintains their visual appearance, independent of the tools and systems originally used for creating, storing and rendering the files.
The 'A' in PDF/A stands for 'Archive' and signifies that the format has been confined to basic PDF features to simplify its long term preservation. PDF/A is not a magic bullet for preserving digital records, although its adoption will assist the preservation of PDF files by preventing encryption, digital rights mechanisms and other features which impede preservation. PDF/A was ISO-approved in 2005 (as ISO 19005-1).
b) Raster images
Raster images are made up of thousands of millions of single dots of colour (pixels) arranged in a grid. They are the kinds of images most often created by digital cameras or scanners.
Some cameras use a proprietary 'raw' format; such formats can be manipulated by software provided by the camera company, but may fall foul of software obsolescence over time. You should therefore save high quality master images (min. 300dpi) in TIFF format, which is a well-supported open standard. If sending pictures via email, or adding them to a website, create lesser quality, but more easily transportable 'throwaway' versions in JPEG format.
c) Email
Think carefully about the service your web-based email account provides; make sure that it is easy to download your email (in bulk and preferably retaining any filing structure you put in place) if you need to. If you intend to place your correspondence in a research institution, you will need to be able to extract it from the email client you download to. Use a client which can save your email in an openly documented format such as MBOX.
d) Websites and weblogs
Comply with W3C recommendations and make sure your (X)HTML, CSS etc. is valid. Select open standard formats for images, audio and video, etc.
e) Operating systems
Your operating system is the central software program within your computer system; it manages all the other programs (known as applications). Apple OS, GNU/Linux and Microsoft Windows are mature, stable and usable platforms.
4. Backup your files
Hard disks fail. It is not a case of 'if' but 'when'. Your software and files could also be lost as a result of flood, fire or power surge; theft is another risk, especially if you use a laptop. This means that regular backing up of records is an essential task. Consider:
a) Making copies on portable media
An external solid state drive is a good option; these are easy to use, inexpensive and provide sufficient capacity for many people. You could use CD-Rs or DVD-Rs, but this will be more time consuming and may result in splitting data over several disks. You should update your removable media as needed; old media can degrade or become inaccessible as technology evolves.
b) Storing a copy off-site
For key files you could make additional copies and store them elsewhere, perhaps with a friend or relative or in a deposit box. If you have a broadband connection, there are many online services which allow you to upload your files, a form of off-site storage. Always read the terms and conditions carefully and check the following:
- Charges: are there any? What happens if you miss a payment?
- Storage limits: is there a limit to the storage you can use? Is this an absolute limit or a limit to what you can upload on a monthly basis?
- File types: does the service accept all kinds of file type?
- Structure and searching: does the service allow you to retain your file structure? If not, what facilities does it have to ensure that you can find your documents and do they work outside of the service?
- Sustainability: has the service got adequate financial backing? If it fails what would happen to your data?
- Scalability: many of these services are currently small; will they be able to keep up with demand?
- Ease of use: How easily and quickly can you upload files?
- Security: does the service offer secure storage and transfer? How secure?
- How long will your data be stored? If a service claims 'forever', ask what is meant by this claim. How exactly do they plan to achieve this? Never forget that keeping a file forever won't guarantee that it will be usable in ten years time let alone a hundred!
- Reclaiming what's yours: how easy is it to get your data back? The services need users to be viable; it's not in their interests for you to extract your data - it may be more straightforward to transfer your bank account than to remove your data.
- Support: is there support available if you need it? Are there FAQs on the website?
c) Using data synchronisation services or software
If you use several different computers, you probably encounter the problem of not having the right file in the right place from time to time. Data synchronisation services can solve this problem by replicating changes to data on one of your computers to all the others you use. This means that you always have access to your data, and you always have one or more backup copies of your data. The difference between this type of service and online backup services is that your data is not permanently held on third party servers, but is encrypted and sent to your other devices for storage. Again, you will need to check the terms and conditions of services carefully.
The first time you synchronise your data, it is likely to take a great deal of time. After this first sync, only the differences in the data are synchronised, which makes synchronising much quicker.
Read the small print and check the following in relation to services:
- Charging: is it free, or is there a charge? Typically the free services only allow you to process a small number of files a day, so a paid-for service may be a better option.
- Transfer limits: is there a limit to the transfers you can make? Is it limited by quantity of files or file size; is the limit generous enough? Is there a limit to the number of computers you can synchronise to?
- Remote access: will the synchronisation service allow you to access the data on your computer remotely?
- Sustainability: has the service got adequate financial backing? If it fails what would happen to your data?
- Flexibility: does the service or software lock you into an operating system or ISP?
- Scalability: many of these services are currently small; will they be able to keep up with demand?
- Ease of use: does the service allow you to perform batch operations? How quickly can you transfer files to your other computers?
- Security: does the service encrypt your data in transit to keep it secure?
- Network drives: does the service support synchronising of network drives?
- Online: must the computers which are to receive the synchronised data be switched on and online at the time of transfer from the sending computer or can they update when they are switched on and online? Where is the data stored in the intervening period?
- Support: is there support available if you need it? Are there FAQs on the website?
Deciding what to backup
Files which would be difficult or impossible to recreate are the most important to backup. For example:
- Digital photographs.
- Email and email address book.
- Data relating to personal finances.
- Professional or business data.
- Licence keys for software (and software, where allowed).
- Digital music or video that you have purchased.
- Your diary or Personal Digital Assistant (PDA).
- Your website.
- Anything else that is personally important to you or your family.
Make a list of what you intend to backup and decide on a backup routine (how often you will backup which files and to what). If you keep a record of your backup and label the media to which you backup, then restoring your system from your backed up data will be much easier. If you delete material that you don't want, and organise files logically, this will also simplify backup.
For added security, you may wish to consider encrypting your backup data. See Tip 7 for more information on encryption.
Backing up your website
Ideally copies of personal websites should be made prior to major changes and updates. If you are the webmaster of your own site and have access to the files which make up the website, then simply create archived versions of your website by compressing these files into a tar or zip file, using a naming convention such as website20060819.zip; store these in a folder with other archives of your website.
If you create your website or weblog using a service, perhaps the easiest method of archiving it would be to capture a copy of the website using Adobe Acrobat Professional, or HTTrack. Alternatively, you could suggest that the UK Web Archiving Consortium archive your website by completing their submission form at http://info.webarchive.org.uk/cgi-bin/submission.cgi, or submit your website to the Internet Archive at http://www.archive.org/web/web.php (you must register to do this).
If your website is database driven, preserving it is more complex, as a copy of the database needs to be captured with the website backup. Speak to an archivist, who will be able to advise you.
5. Look after your hardware and media
Accept that your PC and storage media will fail. Organisations replace PCs, servers and storage media on a cyclical basis; you should too. Replace them before they break and avoid losing your data; five years is currently a reasonable life expectancy for hardware and media.
Minimise the failure of computer components and storage media by keeping them clean and preventing them from overheating. Vacuum dust that collects on your computer equipment and keep air vents clear. If your computer's fan is struggling to cope with the heat, then shut down the computer and postpone unnecessary work. Make sure that hardware and media are stored in stable environmental conditions, and not precariously perched on unsuitable desks or shelving. To avoid data corruption, USB keys and other mass storage devices such as MP3 players and digital cameras should be correctly removed from hardware according to the needs of the operating system. CDs and DVDs should be treated with care to prevent damage.
Invest in a small Uninterruptible Power Supply (UPS). A UPS protects from power surges. The better ones also smooth voltage peaks and troughs, and in the event of a power outage will continue to supply power to your PC allowing it to be shut down gracefully, and avoiding data corruption.
6. Administer your system
Before undertaking major updates of hardware and software take some time to think and plan your actions. It is common for older files to get lost as a result of updates. Files should be backed up elsewhere prior to hardware and software changes.
Be security aware. Unless you are certain your operating system doesn't need it, then anti-virus software and a firewall should be installed and regularly updated. It is important that you do not open suspicious emails or attachments. You should also be aware of the various kinds of 'badware' (like spyware, malware and deceptive adware) which can affect your system. If you are plagued by pop-up ads when online, it is likely that you have badware on your computer; badware can ultimately cause your system to crash and can also lead to the abuse of your personal information. You may be unaware that you have downloaded badware, which can be loaded onto your system when you visit certain websites; sometimes it is even included in proprietary software packages, with no acknowledgment of this on the part of the manufacturer.
Useful information is supplied by the Stop Badware Coalition, a 'neighbourhood watch' campaign aimed at fighting badware.
7. Consider using passwords and encryption devices
If you keep valuable and personal data on portable media, PDAs, or laptops it might be appropriate to consider encryption as a means of keeping data safe; imagine if your laptop were stolen, what kind of valuable, private or even embarrassing information it might contain.
There are many different kinds of software that can encrypt files, folders, emails, attachments, portable storage devices and more. It is best to select open-source encryption software to ensure continued availability. One example of open source (free to use) disk encryption software for Windows XP/2000/2003 and GNU/Linux is TrueCrypt.
If you habitually use encryption, it is important that you remember your passwords; a lost password can render your data inaccessible. One solution is offered by password managers. Software like KeePass enables you to manage your passwords in a secure way. You can put all your passwords in one database which is locked with a single master key or key-disk, so you only have to remember one master password or insert the key-disk to unlock the whole database.
It will become increasingly important for those with valuable password protected or encrypted digital assets to make provision for their access in the event of the creator's death. Relevant details should be stored offsite in a secure location, perhaps lodged with a solicitor or kept in a deposit box known only to likely executors.
8. Be aware of intellectual property rights and privacy
Copyright and licences
In addition to the documents you have created yourself, your own digital archive will inevitably include material (reports, articles, images, music, etc.) created by others. This material (whether officially 'published' or not) will usually be subject to copyright legislation, so you should be careful about how you use it; copying, editing or forwarding copyright material could be a breach of copyright law. Where you do hold digital material which was created by other people, it is useful to be aware of who authored particular documents and when they were created. This information will help the future curators of your archive in determining the copyright status of the material. When you donate or deposit your archive with an institution, the digital archivist is also likely to ask you for permission to make multiple copies of the digital material in which you hold the copyright, for preservation purposes.
Copyright applies to your unpublished written works for 70 years after your death. If you want to encourage greater usage of your digital archive, you could consider applying less restrictive licences to your material, using licences such as those developed by Creative Commons.
Digital Rights Management
It may be that documents which have come into your possession are covered by proprietary Digital Rights Management (DRM). Windows DRM, for example, can be applied to word processors, email clients and other applications; it enables you to choose from a variety of usage rights to define who can open, modify, print, forward, or take other actions with the information. These usage rights are locked within the document itself, controlling how information is used even after it has been opened by intended recipients. Similar digital rights management functions can be attached to Adobe Acrobat products - allowing publishers to control the opening, display and use of files. Other packages also come with such rights management facilities.
One problem with DRM systems is that they are not time limited in the way that copyright law is, meaning that even when the copyright in a particular digital work has expired, there is currently no easy mechanism to remove the copy control systems embedded in the work. This is a major problem for archivists of digital material because undertaking proper digital preservation measures requires them to copy or migrate it into different formats. It is also a barrier to future access by researchers; once material is in an archive, even if still in copyright, it can ordinarily be copied for researchers under the 'fair dealing' provisions of copyright law, but digital rights management systems may prevent this. Avoid using DRM unless it is absolutely necessary.
Privacy
Your digital archive is likely to contain personal data about hundreds of other individuals. Many countries have legislation like the UK's Data Protection Act (DPA) which restricts what can be done with personal data about living people. Although the DPA does not apply to personal archives while they remain in the creator's possession, it comes into force once custody is transferred to a public repository. You might give some thought to the kind of personal data in your archive, and try and ensure that you act fairly and responsibly in your treatment of other people's data. It is helpful for digital archivists to know which sections of your archive are most likely to contain sensitive information, so they can determine how this material is managed in the repository, and how and when it might be made accessible to researchers in the future.
9. Keep up to date
Technological changes are rapid and new technologies are constantly appearing. Interoperability with others and the threat of hardware and software obsolescence mean that you must constantly evolve your digital environment, but do think critically about the impact of these new developments before signing up. What effect will they have on your ability to use your personal archive now and in the future? Will you need to change some of your working practices to safeguard your digital materials from new threats? Keep in touch with your digital archivist who will be able to help you assess new technologies, and offer advice about file format, media, software and hardware migration.
10. Handling legacy digital files
You might find yourself dealing with legacy digital files at one time or another - perhaps those of a family member, a predecessor at work, or your own materials, long-abandoned after a software upgrade. These can be in older, unfamiliar, formats, or it may simply be difficult to evaluate their content.
Dealing with older hardware/media
Best to avoid this problem by maintaining your archive! Older hardware and media can be fragile; you may have a limited number of attempts in which you can read and copy data to new media because the process of reading the disk could cause further deterioration. It can be difficult to obtain the necessary drives (in working order) or cables to transfer data. If you do have digital files stuck on older hardware and media, talk to your digital archivist who may be able to help.
Dealing with unfamiliar formats
You need to be able to read and retrieve information from any legacy records you inherit. Your first task should be to identify what formats you are dealing with. There are online registries you can search, to identify formats and applications that can read them. Some formats may be so outdated that alternative operating systems and hardware are also required. Talk to your digital archivist who can provide information on what is needed to access the files and how they may be migrated to more appropriate formats.
Appraising legacy files
When revisiting files you worked with a long time ago, or browsing those of another, it can be tricky to tell what is worth keeping at first glance. You should conduct an initial survey of the material to form an overall impression. This doesn't necessarily entail opening and reading every record; an initial assessment might involve opening a few files in each folder to assess whether the folder title accurately reflects its contents, and an assessment of the likely significance of the material. File names, dates, author and correspondent names can be useful clues. This survey should help you to identify low value material which can definitely be deleted, material which may have short- or medium-term use, and material with long-term personal or research value.
11. Ask digital archivists for advice
Don't just rely on generic guidelines; seek advice from digital archivists and think about where you might deposit your archive in the future!