Workbook on Digital Private Papers > Administrative and preservation metadata > Metadata for authenticity: hash functions and digital signatures

Metadata for authenticity: hash functions and digital signatures

Types of fixity information

The types of fixity information explored by Paradigm included hash functions and digital signatures. These are also the kinds of fixity information recommended by the PREMIS Data Dictionary for Preservation Metadata.

Hash functions

Hash functions are a method of computing a unique, fixed-size, string of text from a digital object of any size which can act as a fingerprint for the data. By calculating and storing the hash value of data, and later subjecting the same data to the same hash function and comparing the resultant hash values, it is possible to identify whether the data has been altered in some way. Even small alterations to data produce dramatically different hash values (also known as message digests), as shown in the example below:

Figure 17: Hash functions

There are a number of different hash functions available. These break down into three groups: checksums, cryptographic hash functions and cyclic redundancy checks.

Checksums Checksums are relatively simple hash functions calculated from the value of the bytes in the data being checksummed. Example algorithms include:

Cryptographic hash functions Cryptographic hash functions are particularly suitable for archival purposes. Cryptographic hash functions provide additional security, though not all algorithms are equally secure. Simpler algorithms (e.g. MD5) have been compromised and are therefore suitable for monitoring stored data for accidental damage, but not for securing data against malicious alterations. The more complex algorithms enable detection of more kinds of errors. Some digital repositories have chosen to record several hash values for each digital object using multiple algorithms. Examples of cryptographic algorithms include:

Cyclic Redundancy Checks (CRC) CRCs are generally used for checking the integrity of stored data or data in transmission. Examples of CRC algorithms include:

The different hash functions require varying amounts of processing power to calculate their hash values. The hash functions with more complex algorithms, or that produce longer hash values, require more processing power but provide more reliable fixity information.

Digital signatures

Digital signatures are more complex than hash functions. A digital signature is a string of bits that is computed from the data being signed (or its hash value) and the key of the entity (such as a person or organisation) performing the signing. The combination of these inputs to the digital signature permit the recipients of the signed data to verify both the authenticity of the data source as well as the integrity of the data received.

Public key cryptography, the basis for digital signatures, utilises asymmetric encryption which uses a pair of keys - a 'private key' and 'public key'. The private key is securely kept by its owner, and the public key (its partner) can be distributed to those whom the holder of the private key wishes to exchange encrypted or signed data. Public key cryptography therefore enables:

The use of asymmetric keys provides a mechanism analogous to, and potentially more robust than, the traditional signature or seal more familiar to archivists. It produces, in effect, a digital signature.