Workbook on Digital Private Papers > Administrative and preservation metadata > Metadata for authenticity: hash functions and digital signatures

Metadata for authenticity: hash functions and digital signatures

Storing fixity information as XML metadata

If digital repositories mean to use fixity information then some provision for its storage and association with the digital objects the fixity information refers to will be needed. Here the W3C Recommendation XML-Signature Syntax and Processing (for digital signatures), the PREMIS Data Dictionary (for hash values and digital signatures) and the Fedora digital repository (for hash values) are examined as potential modes of recording fixity information.

W3C Recommendation XML-Signature Syntax and Processing

The W3C recommendation XML-Signature Syntax and Processing provides a detailed marking up of digital signatures in XML, optionally locating signed objects (of any type) within XML documents or referencing externally held signed objects. It also allows users to sign parts of documents. The location of the signed content relative to the XML signature may be one, or a combination of, the following:

As part of the process of creating the XML-encoded signature(s) it is necessary to produce a canonical form of the XML. Canonicalisation standardises aspects of the XML document which may not necessarily impair on the meaning of the document (such as line breaks or excessive whitespace) but would give rise to different hash values and thus different digital signatures. XML editors usually support XML canonicalisation of two kinds: Inclusive XML Canonicalization (XMLC14N) and Exclusive XML Canonicalization (EXCC14N).

Creating and verifying XML signatures using the W3C recommendation

XML signature creation

In this example two digital objects are to be signed with a single digital signature:

  • The Paradigm logo (http://www.paradigm.ac.uk/images/paradigm.gif).
  • The Paradigm home page (http://www.paradigm.ac.uk/index.html).
  1. For each object a <Reference> element is created containing:
    • The location (URI) of the object.
    • An ordered list of the transforms (or processing steps) that were applied to the content of the referenced resource before its digest was calculated.
    • The actual algorithm used (such as SHA-1) to calculate the digest value.
    • The digest value (base64 encoded) for the identified object in the <DigestValue> element.
  2. These <Reference> elements are collected within the <SignedInfo> element along with:
    • The canonicalisation method (e.g. XMLC14N as used in the example below) to be applied to the <SignedInfo> element.
    • The signature algorithm to be applied to the <SignedInfo> element.
  3. The <SignedInfo> element does not include explicit signature or digest properties (such as date or calculation time), if these are required they can be associated via a <SignatureProperties> element attached to an <Object> element.
  4. The populated <SignedInfo> element is then canonicalised using the specified <CanonicalizationMethod>.
  5. Finally the <SignatureMethod> which is a combination of a digest algorithm and a key dependent algorithm (e.g. DSA-SHA1) is applied to the canonicalised <SignedInfo> element and the digest result is placed in the <SignatureValue> element.

XML signature verification

The verification of an XML signature consists of of two phases:

  1. Signature validation
    This comprises the verification of the <signatureValue> of the <SignedInfo> element:
    • The digest of the <signedInfo> element is recalculated using the digest algorithm specified in the <SignatureMethod> element.
    • The public key from <KeyInfo>, or from an external source, is used to verify that the <SignatureValue> matches the recalculated <SignedInfo> digest.
  2. Reference validation
    This comprises the verification of the <DigestValue> of each <Reference> element
    • The <SignedInfo> element is canonicalised using the algorithm specified in <CanonicalizationMethod>.
  • For each referenced object in the canonicalised <SignedInfo> the recipient must:
    • Obtain a copy of the object.
    • Apply any transforms specified to the object.
    • Regenerate the digest for the transformed object using the <DigestMethod> specified in its <Reference> element.
    • Validation fails if the generated digest value and the <DigestValue> in the <Reference> do not match.

Arguably the popularity of packaging metadata standards such as METS, which allow the referencing and embedding of metadata and digital files in a single XML file, make the aggregation features of the W3C XML-Signature Syntax and Processing redundant in a digital library context. The PREMIS Data Dictionary also specifies that digital signatures are applicable only to files and bitstreams for preservation purposes and the ability to sign an aggregation of files is therefore not required. Despite this, the W3C Recommendation remains the de-facto standard for encoding digital signatures and its definition of the processing rules around digital signatures and the semantic units needed to record them is useful.

PREMIS metadata for hash values and digital signatures

The PREMIS Data Dictionary for Preservation Metadata provides semantic units for the recording of hash values collected under the <fixity> unit and recommends that preservation repositories store hash values calculated using at least two hash algorithms for each file. Using the PREMIS Object XML schema, hash value information for files can be recorded as follows:

Code sample

PREMIS borrows some elements from the W3C recommendation XML-Signature Syntax and Processing in defining semantic units necessary to record metadata about digital signatures. This metadata includes:

The resulting metadata might look something like this:

Code sample

Repositories employing digital signatures must store their own private and public keys securely. PREMIS also recommends that repositories store the definitions of algorithms and relevant standards used in their context so that these methods could be reimplemented if necessary. If the digital preservation community can agree on a small number of standards, these could perhaps be stored by a central registry, such as the OMAR Representation Information Registry being developed by the DCC.

More information about PREMIS for digital archives can be found in earlier in this chapter.

Using Fedora for recording hash values

As of version 2.2, the Fedora digital repository software provides the facility to calculate, store and verify hash values for all files and metadata managed by the repository; it supports the recording of a single value using one of the following algorithms: MD5, SHA-1, SHA-256, SHA-384 and SHA-512.

The resulting metadata is held as part of the FOXML (Fedora's native XML metadata standard, which can be exported to METS) and looks like this:

Code sample

More details are available in the release notes for Fedora 2.2.