Workbook on Digital Private Papers > Administrative and preservation metadata > Persistent identifiers

Persistent identifiers

Persistent Uniform Resource Locator (PURL)

Background

The PURL system was developed by the Online Computer Library Center (OCLC) in the USA. Its origins lie in library cataloguing applications: PURLs were first implemented in 1996 in the Internet Cataloguing Project - which aimed to advance practice and standards for cataloguing internet resources, and addressed the issue of including URLs in cataloguing records. PURLs were developed as an interim measure until URN technology is fully developed and web browsers are able to recognise URN syntax; they are designed to be automatically translatable to URN architecture when this is established and satisfy as many of the URN requirements as possible using current technology. PURLs provide both a means of identifying a ‘general internet resource’ and of locating and resolving that resource.

See OCLC’s PURL site at http://purl.org/ for more information.

How do PURLs work?

A PURL is essentially a URL; the difference is that it does not take the user directly to the location specified by the URL, but to an intermediate PURL Resolution Service. The resolution service associates the PURL with the relevant URL and returns that URL to the user, who can then access the server direct and retrieve the resource.

The creation of PURLs is straightforward and involves a simple web-based application procedure. Once created PURLs exist permanently, although they can be disabled. In order to ensure persistence, any changes in the associated URL need to be made by the creator or owner of the PURL, but the PURL always remains the same. If the link from a PURL to its associated URL is broken because the URL is moved, the PURL and its full history will still be available as long as the PURL Service itself is maintained.

PURL syntax

A PURL identifier takes the following form:
[Protocol]/[Resolver Address]/[Name]

Hypothetical example:
http://purl.abcd.org/ABC/DEF/200

Protocol PURL uses the HTTP protocol.

Resolver Address PURL uses Domain Name Services (DNS) to obtain the IP address assigned to the resolver, e.g. the OCLC resolver address is purl.oclc.org and the National Library of Australia’s is url.nla.gov.au.

Name The Name is assigned by the organisation or individual creating the PURL. Either upper or lower case can be used, although some characters are not permitted. A full list of the characters allowed in the Name component is given in the FAQ page of the PURL site.

Names are organised as a hierarchy of domains (like directory paths), with a top-level domain name separated by a slash from further sub-domains (each separated by a further slash). In the above example ABC reflects the top-level domain, DEF a subdomain of ABC, and 200 the specific document being identified.

In theory, the same name could be given to two different documents; it is the resolver which makes the identifier unique, and it is not possible to create two identical names under the same resolver. The resolver’s database contains details of all assigned names, which can be checked before creating a name. This system ensures that the name is unique within its namespace.

Resolving PURLs

Resolution of PURLs is carried out using standardised HTTP redirect by means of the OCLC (or another) PURL resolver. A PURL only resolves to a single URL.

Partial redirections can also be set up, using a domain as the prefix for a localised hierarchy of URLs. If the resolver finds no direct match for a particular PURL, it tries to match it right to left based on the hierarchy of domains represented in the syntax; it then resolves as much as it can find and appends the remaining unresolved portion to the end of the resolved URL.

Hypothetical example:
A partial redirect is set up for the URL <http://personaldigitalarchive.ac.uk/>. The PURL associated with this is <http://purl.organisation.org./ABC>.

If this URL includes a lower level document which has the URL <http://personaldigitalarchive.ac.uk/a/very/long/document>, and a match for the full name is not found, the resolver will try to find a match by first of all taking away /document, then /long etc. It will eventually find the match for <http://personaldigitalarchive.ac.uk/>, and will automatically append /a/very/long/document, resulting in the PURL: <http://purl.organisation.org/ABC/a/very/long/document>.

This means that an organisation might create a partial redirect as the permanent prefix for an entire website and its components. Users would use the partial redirect as the prefix for all the documents forming part of that site; if the site is moved, only the single partial redirect location needs to be changed, rather than the full URL for each document.

Access and use

In order to become a registered user of a PURL resolver, an individual has to create a user ID and password on that resolver by following the given instructions. A registered user can create a PURL by using an online creation form, as long as the top-level domain of the name exists; if it does not, a request to create a domain name must be made to the resolver’s administrator.

It is intended that some degree of access control will ultimately be possible, although this has not yet been implemented. Currently all PURLs are universally resolvable, which means that they can be searched and resolved by any unregistered user. In future, it should be possible to create privately resolvable PURLs, domains and partial redirects which can only be resolved by designated registered users of the particular PURL resolvers where they reside; different levels of access (e.g. read, write, maintenance) could be set for different types of user.

Maintenance and adoption

OCLC maintains the PURL server software, and it is made freely available to anyone who wishes to establish their own sub-domain and maintain their own PURLs. To date over 600,000 PURLs have been created.

OCLC has made the source code for PURL available, which means that institutions can install their own PURL resolvers. There is no current list of all the institutions which have set up their own PURL servers, although those which have include the National Library of Australia, the Danish Bibliographical Centre and the U.S. Government Printing Office.

Advantages and disadvantages of the PURL scheme

Advantages

  • It is cheap and easy to create and resolve PURLs; making use of existing services means that no new protocols or modifications to client software are necessary, and the software is freely available.
  • The system is standards based and compatible with both URI and URN schemes.
  • PURLs grew from a library cataloguing context and they could provide an effective means of linking from an EAD catalogue entry to the associated DIP.
  • The scheme is now well-established and widely used.
  • It is scalable: by using the existing distributed technology of DNS/HTTP, many different PURL servers can be established locally, thus avoiding the overloading of servers and enabling greater local control over PURL creation.

Disadvantages

  • PURLs were designed primarily as identifiers for open, web-based resources (essentially ‘published’ material), but digital archives have different requirements. Repositories for personal digtial archives must identify closed or restricted access material and various metadata. They would therefore need to implement PURLs locally in a manner that prevents access by unauthorised parties.
  • They are incapable of dealing with the complexities of any single personal digital archive, which may require different levels of access (e.g. some items may be closed, others subject to access restrictions and others open).
  • In a personal digital archive each individual object must be unambiguously identifiable, so a facility like partial resolution is inappropriate.