Workbook on Digital Private Papers > Administrative and preservation metadata > Metadata for authenticity: hash functions and digital signatures
Metadata for authenticity: hash functions and digital signatures
Types of fixity information
The types of fixity information explored by Paradigm included hash functions and digital signatures. These are also the kinds of fixity information recommended by the PREMIS Data Dictionary for Preservation Metadata.
Hash functions
Hash functions are a method of computing a unique, fixed-size, string of text from a digital object of any size which can act as a fingerprint for the data. By calculating and storing the hash value of data, and later subjecting the same data to the same hash function and comparing the resultant hash values, it is possible to identify whether the data has been altered in some way. Even small alterations to data produce dramatically different hash values (also known as message digests), as shown in the example below:

There are a number of different hash functions available. These break down into three groups: checksums, cryptographic hash functions and cyclic redundancy checks.
Checksums Checksums are relatively simple hash functions calculated from the value of the bytes in the data being checksummed. Example algorithms include:
- sum8 (8 bits)
- sum16 (16 bits)
- sum24 (24 bits)
- sum32 (32 bits)
Cryptographic hash functions Cryptographic hash functions are particularly suitable for archival purposes. Cryptographic hash functions provide additional security, though not all algorithms are equally secure. Simpler algorithms (e.g. MD5) have been compromised and are therefore suitable for monitoring stored data for accidental damage, but not for securing data against malicious alterations. The more complex algorithms enable detection of more kinds of errors. Some digital repositories have chosen to record several hash values for each digital object using multiple algorithms. Examples of cryptographic algorithms include:
- HAVAL (125 to 256 bits).
- MD5 (128 bits).
- SHA-1 (160 bits).
- SHA-256 (256 bits).
- Tiger (192 bits).
- Whirlpool (512 bits).
Cyclic Redundancy Checks (CRC) CRCs are generally used for checking the integrity of stored data or data in transmission. Examples of CRC algorithms include:
- CRC 16 (16 bits).
- CRC 32 (32 bits).
The different hash functions require varying amounts of processing power to calculate their hash values. The hash functions with more complex algorithms, or that produce longer hash values, require more processing power but provide more reliable fixity information.
Digital signatures
Digital signatures are more complex than hash functions. A digital signature is a string of bits that is computed from the data being signed (or its hash value) and the key of the entity (such as a person or organisation) performing the signing. The combination of these inputs to the digital signature permit the recipients of the signed data to verify both the authenticity of the data source as well as the integrity of the data received.
Public key cryptography, the basis for digital signatures, utilises asymmetric encryption which uses a pair of keys - a 'private key' and 'public key'. The private key is securely kept by its owner, and the public key (its partner) can be distributed to those whom the holder of the private key wishes to exchange encrypted or signed data. Public key cryptography therefore enables:
- Data confidentiality - as long as the private and public keys are held securely by their respective owners, public key cryptography enables the confidential exchange of data over potentially insecure networks.
- If a private key is used to encrypt data (some digital signature algorithms encrypt the signature rather than the data being signed) then this data can only be read by the holders of the corresponding public key.
- If a public key is used to encrypt data then this data can only be read by the holder of the corresponding private key.
- Data integrity - the authenticity of the data signed with a private key can be verified by recipients in possession of its corresponding public key. Any alterations, whether deliberate or accidental, can easily be spotted by the validation process because a digital signature will not verify as authentic if either the signed data or the signature is altered.
- Originator authentication - the identity of the originator of data signed with a private key can be verified by recipients in possession of the corresponding public key because the private key is held only by the originator and it cannot be forged so long as the originator keeps it secret.
- Non-repudiation - digital signatures created with private keys support non-repudiation because they are partly computed from the data being signed and because the public key can only decrypt information encoded using the corresponding private key, which is tied to a specific entity.
The use of asymmetric keys provides a mechanism analogous to, and potentially more robust than, the traditional signature or seal more familiar to archivists. It produces, in effect, a digital signature.