Workbook on Digital Private Papers > Administrative and preservation metadata > Persistent identifiers

Persistent identifiers

General important characteristics in PIDs

In order to succeed in identifying one digital document as distinct from another, in a world full of easily movable and reproducible digital matter, repositories must employ naming conventions which make names independent of addresses. These must ensure that a name is only used for one 'thing' in a given namespace so that any ambiguity about the identity of individual manuscripts is impossible.

Furthermore, the name must persist in such a way that it unambiguously identifies the manuscript indefinitely, so that when a manuscript is ordered by its name in 500 years time the researcher can be sure of obtaining it. Although schemes have been devised to resolve some of the issues around persistent identifiers, the reality is that much of the complication is social rather than technical. The key is organisational commitment to a method and effective administration of the selected scheme: identifiers can only be persistent if they are managed.

Some of the aspects important to persistent identification have been categorised as:

Removal of potential ambiguities

To provide unambiguous names, it must be known that the name is not already used in a given namespace.

Example:
In this first image, there are three local namespaces which are those of three separate institutions; there is also a global namespace. Local namespaces 1 and 2 each have an object:1 in their namespace; this causes no problems until both organisations decide to put their object:1 into the global namespace - this causes a conflict because two items cannot have the same identifier in one namespace.

Figure 7: Namespaces

To ensure that conflicts do not happen in a namespace, there must be rules for the apportioning and allocation of names in each namespace. If the identifier is unique within the local namespace, and there is some means of indicating within the identifier which namespace the object belongs to, then uniqueness in the global namespace is ensured.

People friendly

Although computers can easily create unique character strings to represent names (e.g. liduf000alq7t), people prefer units which have some kind of meaning, or which are readable, writable or memorable. However, there are problems associated with using natural language in identifiers; meanings and nuances can change - both over time, and between different cultures and languages. Whilst in an archival context, PIDs should be simple enough for reading room staff to convey over the phone, or for readers to key into a search facility, ideally they should not convey any obvious meaning, and might best be comprised of a simple combination of digits and non-vowel alphabetical characters.

Persistent

Persistence is maintained so long as names continue to be apportioned and allocated according to the rules, and are therefore not used ambiguously, and so long as the current location of an object is known to the resolver of the identifier. The infrastructures responsible for these activities must therefore be evolved and sustained indefinitely. Some of the factors which might affect the longevity of PID systems are:

Retrievable

Once a name is allocated, there is a social expectation that the name should always refer to the item and that the item, or at least information about the item, should be retrievable on production of its name to the correct service. This means that names must be distinct from addresses, so that when the name of the object is given to a service, the service can resolve the current location of the object in order to present it to the user.