Workbook on Digital Private Papers > Arranging and cataloguing digital and hybrid archives > Arranging and cataloguing websites

Arranging and cataloguing websites

Between 5 April and 9 May 2005 (the General Election period), Paradigm made regular snapshots of the websites maintained by select politicians (from whom permission was sought). Both HTTrack and Adobe 7.0 Professional software were used.

Many politicians (and individuals in other fields) maintain their own websites, and these sites therefore form an integral part of an individual's personal archive. The approach to collection development (see Chapter 02 Collection development) taken in any one case will determine how many snapshots or versions of the website are held by the digital repository, and the period these cover.

To date, much of the work on archiving websites has been undertaken by libraries; this means that whilst Dublin Core is being employed in some cases, catalogues for the archived websites often take the form of MARC records stored in bibliographic databases and library catalogues. Websites have therefore largely been viewed as publications, and the principal creator as the publisher. There are also moves in a number of countries towards making websites subject to Legal Deposit. In the UK, the Legal Deposit Libraries Act 2003 extends legal deposit to non-print forms and currently comprises enabling legislation. The British Library is encouraging the deposit of online material under the Voluntary Code of Practice 2000; an independent Legal Deposit Advisory Panel has also been established by Government to oversee the various stages of secondary legislation which are likely to lead to Regulations by format (one of these formats being websites).

Should websites become subject to legal deposit, archivists may take the decision to treat an individual's website as one of their publications and to exclude it from their digital archive unless there is good reason for its inclusion. Until then, it is still important for archivists to deal with the personal websites of their donors and depositors, and to capture different versions of these mutable records over time.

Rather than taking these website snapshots themselves, archivists may decide to submit details of the relevant sites to an organisation like the UK Web Archiving Consortium (UKWAC) which would undertake web harvesting and archiving on their behalf. The EAD catalogue could then link to these snapshots, whilst making it clear that the snapshots were held in a different repository to the rest of an individual's digital archive.

If undertaking this work in-house, a succession of snapshots of an individual's website should probably be placed at series level in an EAD catalogue. The subfonds above it will either reflect the office where the site is maintained, or the individual (usually a member of politicians' staff) responsible for maintaining the site. Each snapshot (whether of a homepage or entire site) should be treated as an item. Each item will therefore be the equivalent of what the Australian web archiving project PANDORA refers to as a date stamped 'edition'.

Cataloguing at series level should pull all the essential information about a single website together. Item level cataloguing will be minimal and focus on controlled access index terms to facilitate searching and browsing. The functionality of websites, and the fact that they are or have been in the public domain, may mean that users are permitted to navigate sites freely, although before doing this they should be made aware (via the EAD catalogue and DIP metadata) of IPR issues, of the fact that they are viewing an archived website, and of metadata regarding creator, dates and technical issues (like loss of functionality).