Workbook on Digital Private Papers > Administrative and preservation metadata > Persistent identifiers

Persistent identifiers

Digital Object Identifier System (DOI)

Background

The DOI makes use of the Handle System for resolving identifiers, but Handles are only one component of the DOI System, which provides a complete framework for managing digital objects, including a structured means of identification, description and resolution, along with policies, procedures, business models and application tools. It is designed to be independent of the DNS and HTTP protocol, although can be used with this system via the DOI proxy server at <http://dx.doi.org>.

The DOI System was developed as part of a project run by the Association of American Publishers and was launched in 1997 at the Frankfurt Book Fair. It grew out of publishers' concern about control of intellectual property in the digital environment. Its focus was initially on content identification (i.e. a unique identifier would be assigned to a work at the point of creation); however, it was recognised that the issue of persistent identification has value beyond the world of electronic publishing, and so the DOI was developed as a cross-industry and cross-sectoral non profit-making organisation, managed by the International DOI Foundation (IDF, founded in 1998). The system is intended to provide a generic framework applicable to any digital (or other) logical entity; and a DOI name may be assigned to any item of intellectual property, or the parties, events or agreements involved in an intellectual property transaction.

See the DOI System website for further information; the site includes links to numerous overview documents, frequently asked questions and the complete DOI Handbook which contains policies, procedures and guidelines for participating organisations.

How does the DOI System work?

The DOI system consists of four principal components:

DOI syntax

DOI syntax is specified by a NISO standard (ANSI/NISO Z39.84) and is similar to that of the Handle. The syntax takes the form of a prefix and a suffix divided by a slash, as follows:

[Directory Code].[Registry code]/[Local Name]

Hypothetical example:
10.7890/object786

Directory Code The International DOI Foundation (IDF) is a Naming Authority under the Handle system; it has been allocated the number 10 as its unique identifier and this forms the Directory Code in the DOI namespace. All DOI names therefore begin with the number 10.

Registry Code The Registry Code (preceded in the syntax by a dot) is a unique number assigned by the IDF to an organisation that has been authorised to register DOI names - known as a Registration Agency (RA). Anyone wishing to assign DOI names must work through a RA, which is usually based on a particular 'community of interest' and any organisation representing this interest can apply to become an RA (e.g. the CrossRef RA provides citation-linking services for the scientific publishing sector). The role of an RA includes providing services and day-to-day support to registrants, e.g. the allocation of prefixes, registering DOIs, quality assurance. If there is no suitable RA for an organisation's needs, the IDF itself can act as the 'default' RA.

Local Name The local name suffix can be any alphanumeric string chosen by the registering organisation, which allows existing identification schemes to be incorporated into the DOI namespace. Any characters included in UTF-8 can be used, and the local identifier can go to a very granular level, e.g. identifying a paragraph within a larger document.

Resolving DOIs

Resolution of DOIs is carried out by means of the Handle System. DOIs do not have to resolve directly to the resource identified by the DOI, although they can do this.

In the past DOIs have generally been used to resolve to a single location (a URL, which might be a publisher's website, or a digital repository's website), thus providing a basic tool for persistence. However, DOI has now developed the capability to resolve to multiple associated data (e.g. a number of digital objects, metadata or repository information), which means that resolution can be much more granular. It is also possible to indicate relationships between digital resources (e.g. the same document in different formats, or earlier and later versions of the same document), by declaring related entities in the metadata for a DOI, or resolving from one entity to another. Whilst this is possible using the Handle System alone, the DOI provides a framework whereby relationships are defined through metadata using a semantically interoperable data dictionary.

The current location of each resource identified by a DOI is stored in the DOI system server, and any changes to this location must be registered there.

In the HTTP world DOIs can be resolved through the DOI resolver at <http://dx.doi.org> and through the global Handle resolver at <http://hdl.handle.net>. The DOI has also applied for a "doi:" URI scheme to allow a DOI to be expressed as a URI without the need to reference specific HTTP servers.

Metadata

The DOI System has a Data Model to ensure that every identified object is unambiguously described in a standardised way which facilitates semantic interoperability and consistency. It is not mandatory for DOI names to make use of this Data Model, although the scheme envisages that many will.

At the most basic level, the DOI Data Model allows a 'kernel' of basic metadata to be attached to a piece of intellectual property (on which optional extended metadata schemes can be built). This kernel declaration takes the form of an XML schema and contains 8 elements (drawn from the iDD) which include information about: what the object is; whether it has any other identifiers; what it is usually called; the identity of its creator or publisher; whether its location is digital, physical, etc; and what type of resource it is (e.g. audio file, pdf document). This kernel metadata at present only relates to 'creations' and different kernels would have to be defined for different types of resource or entity, e.g. people or events.

To provide more granular metadata which is common to a particular community, the DOI allows the establishment of Application Profiles (APs). These are a means of grouping together DOI names with common properties (e.g. they describe entities of the same format, share the same metadata schema, or the same rules for access and use); they ensure that a particular type of DOI name behaves predictably in an application through association with specified services. An AP comprises at minimum a set of structured metadata elements, as well as some rules about policy and procedure. Any existing metadata standard can be used in an AP, but the DOI requires that for full interoperability across the DOI system this should be mapped to the iDD. XML is recommended by the DOI both for kernel metadata and AP metadata extended from the kernel.

The DOI system may be used in a restricted or non-public environment; a 'Restricted' AP is used for this purpose. This ensures local good practice and also means that the private identifiers can easily be moved into the public realm (e.g. as archive material moves from a dark to a light archive at the expiration of copyright protection) without having to be altered or reassigned.

Maintenance and adoption

The central authority and maintenance agency for the DOI System is the IDF, which provides standards and a technical and social infrastructure for DOI users. The IDF is controlled by an executive board elected by members of the Foundation. Membership of the IDF is open to any organisation with a stake or interest in managing information in the digital environment. Current members include publishers, software companies and organisations which represent the interests of publishers or other IPR holders, e.g. the International Publishers Association, the Joint Information Systems Committee (JISC), the Online Computer Library Center, The Open University, and the national libraries of the UK, Germany and the Netherlands. Organisations pay an annual subscription, which varies according to categories of membership - although general membership is $35,000. The fee system was introduced so that the IDF can establish itself as a self-funding body in order to ensure long-term sustainability.

The IDF delegates and licences authority to use the DOI through Registration Agencies, each of which must be a member of the IDF. Each RA can determine its own local policies and make use of DOIs in appropriate ways for its own environment. While the IDF charges RAs an annual fee, it does not stipulate how that sum should be raised (e.g. by charging lower-level organisations for assigning a DOI).

The DOI System has had widespread takeup. Tens of millions of DOI names have already been assigned by several hundred different registered organisations. Many of these are operating in the commercial scientific publishing environment, but some publicly funded projects also participate in the scheme.

Advantages and disadvantages of the DOI System

Advantages

  • The scheme is run by an established and robust organisation which is likely to be sustainable in the long-term.
  • It has been adopted by libraries as well as commercial organisations.
  • It provides an infrastructure for implementing a comprehensive digital identifier system, whilst leaving each RA with a considerable degree of autonomy to implement their own system, e.g. there might be scope for establishing an RA for those working with personal digital archives.
  • The possibility of establishing a 'Restricted' Application Profile means that the scheme could be used in a non-public digital repository environment or dark archive as well as an open environment.
  • It is standards based and DOI metadata is created using XML, both of which maximise interoperability.

Disadvantages

  • There is a strong emphasis in the DOI member list on the commercial sector (e.g. publishing and software), or very large information institutions like national libraries. The annual subscription would be prohibitive for smaller libraries and archives. The alternative approach of working with a larger institution that has RA status may mean that the specific requirements of personal digital archives held in smaller institutions are overlooked.
  • Whilst the DOI system offers a sophisticated data model which allows the creation of standardised metadata about digital resources and the grouping of resource-types into Application Profiles, these functions are probably superfluous to the needs of many curators looking after digital archives: extensive metadata is already produced for each digital object (e.g. using METS and PREMIS as metadata standards; both provide granular information and deal adequately with issues like relationships between digital objects), and services associated with the digital objects are likely to be managed by the repository. Given the costs involved in subscribing to the DOI system, an institution should probably only sign up if it wishes to take advantage of the full range of functions DOI offers; for the more basic needs of a digital archive, simple identifier systems are probably a more appropriate and cost-effective option.
  • DOI currently recommends using the scheme to identify only resources, parties and events associated with intellectual property transactions, whereas Paradigm has identified the need for a wider range of identifiers - e.g. for preservation actions, or agents (repository staff or software) who have carried out preservation actions.