Workbook on Digital Private Papers > Administrative and preservation metadata > Persistent identifiers

Persistent identifiers

Persistent identifiers, often referred to as PIDs, provide a means of connecting and distinguishing between an identifier for an object (which should be permanent) and an object's location (which may change). Researchers use a form of persistent identifier (usually a reference code or shelfmark) when citing archives or manuscripts in a publication, or when requesting access to them. The manuscript's identifier must be permanent and independent of the manuscript's location so that the source of the researcher's statement can always be accessed, even if the storage location of the manuscript changes. For the identifier scheme to work, a resolver, which knows the location of the manuscript referred to by the identifier, is required:

Figure 5: Resolver

Example scenario: When a researcher uses an identifier to request a manuscript in a special collections reading room, the location of the manuscript may be resolved by a staff member who (after consulting a location guide) will retrieve the manuscript from its location and present it to the reader. By maintaining a system to resolve locations from identifiers, special collections staff are able to satisfy reader requests for manuscripts even when their location changes.

Digital objects also require persistent identifiers that connect and distinguish between identity and location. It is possible that locations will change more frequently in the case of digital manuscripts owing to the need for regular refreshment of storage media to guard against media failure. It is also likely that an intellectual entity acquired in digital form must be associated with multiple representations of itself over time, as technological obsolescence requires the repository to migrate away from the formats of the original representation to those accessible using contemporary computing environments.

Figure 6: Distinguishing between identity and location

A repository could, in theory, use the same string construction as employed for identifiers of traditional manuscripts in its identifier scheme, though these structures are not usually suited to digital environments. The identifier systems of the Bodleian and John Rylands libraries illustrate this point:

The Bodleian's shelfmarks

The Bodleian uses shelfmarks (which are independent of location) and folio numbers to compile a reference code that identifies its archival materials.

MS. Berlin 102 fol. 230 is the identifier for a letter from C.F. Hardie to Sir Isiah Berlin [1932].

MS. Berlin = Papers of Sir Isiah Berlin

102 = the 102nd shelfmark assigned in the Papers of Sir Isiah Berlin (in this case a box from the series of general correspondence, 1927-97)

fol. 230 = the 230th folio in the box

The John Rylands' reference code

The Rylands assigns a three letter mnemonic to an archive and lower levels of archival description are identified using hierarchical slashes.

RMD/1/2/5 is the identifier for a letter from Bruce Glasier to Ramsay MacDonald, 17 Feb. 1907

RMD = Papers of Ramsay MacDonald

/1 = the first series of material in the Papers (in this case, entitled ‘Correspondence and related papers')

/2 = the second subseries of the above series (in this case, representing ‘letters from 1907 and the ILP')

/5 = the fifth item in the above subseries

These traditional identifiers are not easily implemented in a digital context. The Bodleian's identifiers contain spaces, inconsistent case and punctuation, and those conforming to the Rylands' system are easily mis-keyed. Neither system accommodates the fact that if digital, a letter could be deposited in one format and subsequently migrated to another, leading to two representations of the same intellectual entity; neither could cope with the need to identify constituent files in complex objects, such as websites. Paradigm has therefore concluded that it is preferable to allocate each intellectual entity (digital or otherwise) a traditional identifier at the time of cataloguing so that identifiers within the catalogue are uniform and comprehensible to researchers, but that more granular identifiers designed to persistently identify representations of the intellectual entity will also be required. Employing a system of persistent identification that is more suited to the digital world for identifying original and successor representations of the digital manuscripts in the digital archive repository will facilitate administration and preservation because it enables the repository to assign identifiers on ingest, or on the creation of new representations resulting from preservation actions; the assignment of identifiers cannot wait until the archive is subject to archival description (which is likely to be a considerable period after accession).

The topic of persistent identifiers for digital material has been subject to much debate and multiple schemes that fulfil the same, or similar, objectives have been created. Currently there is little agreement as to which scheme offers the best solution, and each has its own proponents with vested interests in its proliferation. The problem statement offered at the March 2006 meeting of the NISO Identifiers Roundtable sums up some of the difficulties surrounding the topic:

  1. There is no shared view of the nature of an identifier, its properties, and the requirements for its creation and use.
  2. There is considerable duplicative effort across disciplines and sectors; although each discipline considers its efforts unique because its underlying data is unique, at an information science level they are often pursuing the same ends by similar means.
  3. Identifiers can only be fully considered in conjunction with their supporting services, including systems for creating identifiers, binding them to information or objects, and resolving an identifier to obtain the associated object or information (metadata) about it.
  4. Although much of this work is being conducted outside of the traditional library community, it is inescapable that much of it will eventually impinge upon libraries, due to their traditional role in gathering, archiving and disseminating information across all domains of human activity. The experience of NISO and its member bodies could helpfully inform a broad interdisciplinary discussion of identifiers and their requirements.

NISO Roundtable

What follows includes an exploration of the issues surrounding persistent identifiers, an articulation of some of the envisaged uses of persistent identifiers in the context of preserving digital archives, and an overview of some of the persistent identifier schemes available.