Workbook on Digital Private Papers > Digital repositories > Comparing repository software for preserving personal digital archives

Comparing repository software for preserving personal digital archives

Archival concerns

The care of material in a repository catering for born-digital personal archives is the duty of professional archival staff. Archivists must be able to understand and have faith in the system's security and its processes, and be able to interact with the system confidently. Archivists have a duty of care to ensure the authenticity, continuing availability and robustness of archival material, both to the creators of archival material and the researchers who will use it, and to support the eventual use of archival material by researchers and to satisfy freedom of information requests.

Name Supports audit trails
Detail Born-digital personal archives will be retained by their collecting institutions indefinitely, and it can be assumed that during this time items will be moved to new storage media and new formats on numerous occasions. The repository must provide mechanisms to demonstrate that an item is as it was when submitted to the repository - that it is authentic.
DSpace Partial support
Detail Some activities, such as submitting and approving a bitstream, are recorded as qualified Dublin Core metadata using description provenance - this records the name, the date and time, filenames, size in bytes and an MD5 checksum.
Fedora Partial support
Detail Modifications to files or metadata are logged in Fedora's audit metadata, which records information about who did what and when, and associates it with the object. Metadata associated with migration or preservation events are not created by Fedora, though Fedora could support the addition of such metadata.
Name Supports unique identification of metadata, digital files and conceptual objects.
Detail In order to maintain intellectual control of items in the repository it must be possible to apply unique identifiers to objects and metadata managed by the repository.
DSpace Supported
Detail Each Community, Collection and Item is allocated a Handle in the current version of DSpace. In version 2 DSpace will support other persistent identifiers in addition to Handles, and it will be possible to apply identifiers at more granular levels.
Fedora Supported
Detail Each object, file and metadata (and version thereof) is given a unique identifier by the Fedora repository. Repositories may opt to use the Handle system with Fedora. For example, see the VTLS OSC suite of tools, which includes a service for integrating the Handle System with Fedora.
Name Supports reliable binding of metadata and digital object
Detail In order to maintain intellectual control of items in the repository it must be possible to permanently associate an archival item with its metadata, both within the repository and in any export functionality.
DSpace Partial support
Detail Each Community, Collection and Item can have its own metadata. Individual files which make up an item are allocated basic metadata, which is displayed by the containing Item's metadata; it is unclear which metadata belongs to which file. Recommendations for version 2 of DSpace include allowing metadata at more granular levels.
Fedora Partial support
Detail Dependent on implementation. If multiple files and their metadata are stored within a single object wrapper, then the repository must itself implement conventions which specify which metadata belongs to which files. If repositories use an atomistic model, with one file and its metadata to an object, metadata and object are unambiguously connected.
Name Supports referenced metadata
Detail Some metadata is applicable to several objects and is best held once, such as an EAD collection level archival description, or rights metadata. Other metadata, such as file format registries, may be curated in repositories external to the organisation.
DSpace Not supported
Detail  
Fedora Supported
Detail Files and metadata may be held outside of the repository and referred to; relationships between objects in the repository can also be formed.
Name Supports complex inter-object relationships
Detail Meaning in archival materials relies heavily on context; it is necessary that the repository supports complex hierarchical relationships found in archives.
DSpace Not supported
Detail The DSpace data model is designed for flatter collections and is not well-suited to complex structures.
Fedora Supported
Detail Fedora can support complex multi-level relationships through its RDF metadata. It is also possible to ingest METS structural maps to reflect the original order of an archival accession, to ensure that this is preserved for the archivist who will catalogue the archive.
Name Supports appropriate metadata standards
Detail Support for open and widely adopted metadata standards increases object portability, tool availability and the likelihood of recruiting staff familiar with metadata employed by the repository. Support for PREMIS preservation metadata has not been incorporated into any repository yet, and support for technical metadata is very limited.
DSpace Partial support
Detail METS, OAI-PMH and Dublin Core are supported. Additional metadata may be added as 'serialized datastreams'.
Fedora Partial support
Detail Fedora stores metadata in its native FOXML, which can be exported to METS (a Fedora extension of METS); it also supports OAI-PMH and Dublin Core. Fedora can store any kind of valid XML metadata and can be configured to index this metadata using the Fedora Generic Search Service.

Metadata extraction support is limited to a web service for the Jhove validation and technical metadata extraction. There are currently no tools to generate or act on PREMIS preservation metadata.

Relationships between objects can be recorded using METS structural maps or via RDF metadata, but Fedora provides no interfacing ith those relationships (e.g. would not display a complex object).
Name Supports simple and complex objects
Detail Personal digital archives contain a range of simple objects, consisting of a single file, and complex objects, which are composed of multiple files that must be reassembled to recreate the object. The repository should be capable of supporting both kinds of object.
DSpace Partial support
Detail Allows multiple files to be bundled together in an Item, but this limits the metadata that can be applied.
Fedora Supported
Detail The Fedora data model allows users to bundle files together in an object, or to store files in their own objects and create relationships between them.
Name Supports multiple types and formats
Detail Personal digital archives can contain a wide variety of material, from email to simple image files, from spreadsheets to word-processed documents, from websites to audio files. The repository should be capable of supporting a wide range of object types and formats.
DSpace Supported
Detail DSpace has a bitstream registry which details the formats that the repository accepts, and the level of support the repository provides for them. Additional formats may be added to the registry.
Fedora Supported
Detail Supports any mime-type.
Name Supports automatic metadata creation
Detail The preservation of born-digital archives requires a great deal of metadata. The automation of this metadata is extremely advantageous.
DSpace Partial support
Detail Some audit metadata, etc., is created automatically. Much of the metadata must be input through the web user interface.
Fedora Partial support
Detail Audit metadata is created automatically, and checksum metadata may be created automatically. A Jhove Metadata Extraction Service is available to add some technical metadata. The SIP Creator/Dir Ingest service can automate the creation of relationship metadata. Much metadata, including descriptive and preservation metadata, must be compiled manually.
Name Supports bulk ingest
Detail Digital materials must be properly ingested into a managed environment as soon as possible, bulk ingest is therefore highly desirable.
DSpace Supported
Detail Provides a command-line bulk ingest tool; files must be arranged according to a specified hierarchy to map to the DSpace data model.
Fedora Supported
Detail The Fedora Management web service has SOAP-based operations to ingest digital objects in different XML wrapper formats (METS and FOXML). This same web service has other SOAP-based operations to add datastream content to an object that is already in the Fedora repository. Fedora also has a separate “Directory Ingest” service that runs as a web application; this service accepts a zip file that contains a hierarchical directory of files along with a METS manifest file, opens the zip file and calls the Fedora Management web service to ingest each file as a digital object, preserving the hierarchical directory relationships.
Name Supports bulk export
Detail Bulk export will be necessary for an institution moving to another repository technology, or one returning deposited materials to a creator. Archival materials and their metadata are likely to be moved to the next version of the repository software, and beyond that will one day be migrated to an entirely new system. It should be possible to easily migrate objects and metadata, and preference should therefore be given to implementations of metadata standards which are open and widely adopted.
DSpace Supported
Detail Provides a command-line tool (dspace-export) that outputs a METS file per collection with references to the digital files (called bitstreams by DSpace) in the collection. DSpace can also export in the DSpace ingest format.
Fedora Supported
Detail From the GUI client, command-line or through a homegrown SOAP client.
Name Supports appropriate content models
Detail Content models allow repositories to specify how particular classes of object should be treated. This increases efficiency and quality.
DSpace Not supported
Detail The DSpace content model is rigid, and characterised by the Community and collection concepts of a repository for academic output.
Fedora Partial support
Detail Fedora allows the user to define their own content models. Work on formalising content models, including defining a content model definition language, is underway.
Name Supports format identification
Detail Reliably identifies an object as being of a particular format and assigns this metadata.
DSpace Not supported
Detail Objects are associated with a format manually. The permitted bitstream formats recognised by the system are stored in the bitstream format registry. The contents of the bitstream format registry are entirely user-defined, though the system requires that the two default formats are present (Unknown and License).
Fedora Not supported
Detail Datastreams are manually associated with a mime type and optionally a format URI (this is a user-assigned URI which supports identification of the media type of an object in a more specific way than using a MIME type).
Name Supports file validation
Detail Validates an object against a specification to evaluate its correctness and completeness.
DSpace Not supported
Detail A command-line tool to run Jhove over the DSpace asset store has been developed by the DSpace community.
Fedora Not supported
Detail Use of the Jhove tool in conjunction with Fedora provides validation for some formats.
Name Supports versioning
Detail Allows the repository to keep older versions of metadata and files.
DSpace Not supported
Detail The proposed changes to come in version 2 of DSpace will introduce versioning and the concept of Manifestations for Items, which may have their own metadata records.
Fedora Supported
Detail As of version 2.2, Fedora allows users to decide whether each metadata or digital file is versionable, or whether older versions should be overwritten by newer versions. For datastreams or metadata that are versionable, changes result in a new timestamped version being created. Older versions remain accessible.
Name Easy to use workflows
Detail Archivists must work with the repository in order to apply professional treatment to the processing of these assets. It is important that repository interfaces support use by less technical users.
DSpace Partial support
Detail Provides ingest workflow via a web user interface for non-technical users. The architecture group has proposed that version 2 of DSpace support a wider variety of workflows, which go beyond initial ingest and include migration, versioning and export and that these should be configured by users through interfaces provided by DSpace. The DSpace community are also evaluating workflow engines.
Fedora Not supported
Detail Fedora's design anticipates the creation of a workflow outside of the repository. It provides a basic client which is usable (with training) for working on single items, but the open source workflow interfaces designed by other Fedora users (such as Fez and Elated) do not meet the processing requirements for archival materials.
Name Supports appropriate security mechanisms
Detail Born-digital archives will often be subject to embargo for a number of years owing to privacy and other concerns. Once privacy concerns cease, copyright still influences the manner in which the archives may be used. Security is of the utmost importance in building the confidence of potential donors; a security breach could be disastrous for the reputation of an archival repository and could have serious implications for collection development.
DSpace Supported
Detail Provides data transfer encryption (SSL).

Authenticates users via a web user interface or LDAP.

Supports different user accounts and roles, and has a web interface for editing permission policies.

From version 2 Epeople (DSpace terminology for users) will have persistent identifiers in the form of URIs.

Direct access to Java API, database and filesystem requires user privileges on the machine hosting the DSpace repository.
Fedora Supported
Detail See Fedora's security documentation.

Can restrict access to Management and Access APIs based on IP address.

Management API is protected by basic HTTP authentication.

Can provide data transfer encryption (SSL).

Can create multiple users (with roles and permissions that can be used in XACML access policies) in fedora-users.xml file; by default supports a single known user (fedoraAdmin) and other users are anonymous. Multiple users are needed for audit trail purposes.

Can defer authentication to application; Fedora therefore authenticates the application and expects the application to undertake user authentication.

XACML can be used to define repository level policies and item-level policies. Policies can be very granular, e.g. restricting access to a file but allowing metadata access.

Repository administrators are expected to provide the storage locations of metadata and content objects with adequate security.

As of v 2.2 Fedora can authenticate users against an LDAP server.
Name Supports technology watch
Detail A digital repository of personal digital archives will contain multiple material types which are submitted in a variety of different formats. It will be necessary to automate some technology watch functions to monitor the status of the materials in the archive so that preservation actions can be planned, prioritised and implemented as necessary. The repository should alert administrators to file formats which are at risk of obsolescence.
DSpace Not supported
Detail An event mechanism has been proposed for version 2 of DSpace and the current EventMechanism prototype being worked on for version 1.5 might provide a basis to meet this requirement.
Fedora Not supported
Detail A preservation monitoring service (based on event notification) is planned.
Name Supports notification of objects due for review, or opening for research
Detail The repository should notify the administrator when objects can be made accessible to researchers, or when their status should be reviewed.
DSpace Not supported
Detail An event mechanism has been proposed for version 2 of DSpace and the current Eventmechanism prototype being worked on for version 1.5 might provide a basis to meet this requirement.
Fedora Not supported
Detail If the planned event notification service materialises this might satisfy this requirement.
Name Provides reporting features
Detail The repository should be able to generate statistics that would be useful for planning and prioritising preservation strategies. One such report might be on the file formats represented in the repository. It should also be able to provide useful statistical information, such as the quantity and quality of material ingested into the repository in a given period.
DSpace Partial support
Detail Some statistical reports can be generated by analysing DSpace's log files.
Fedora Not supported
Detail The features documentation alludes to a reporting utility which does not appear to exist?
Name Supports digital provenance metadata
Detail The repository should allow users to trace migrated objects back to the original submission, with an account of the object's migration history.
DSpace Partial support (experimental)
Detail The History subsystem (referred to at the DSpace Sourceforge website) is explicitly invoked when significant events occur (e.g., accepting an item into the archive). The functionality of this part of DSpace is documented as a largely untested experiment. A replacement for inclusion in version 1.5 is being worked on.
Fedora Supported
Detail As of version 2.2, Fedora supports journaling alongside the existing auditing and versioning functionality. There is no explicit functionality though to provide an account history at present and how the digital provenance metadata could be used would be dependent on the content model used.
Name Supports digital provenance metadata
Detail The repository should allow users to trace migrated objects back to the original submission, with an account of the object's migration history.
DSpace Partial support (experimental)
Detail The History subsystem (referred to at the DSpace Sourceforge website1) is explicitly invoked when significant events occur (e.g., accepting an item into the archive). The functionality of this part of DSpace is documented as a largely untested experiment. A replacement for inclusion in version 1.5 is being worked on.
Fedora Supported
Detail As of version 2.2, Fedora supports journaling alongside the existing auditing and versioning functionality. There is no explicit functionality though to provide an account history at present and how the digital provenance metadata could be used would be dependent on the content model used.
Name Supports integrity monitoring for metadata and objects
Detail The repository should monitor digital objects and metadata to ensure that they have not been damaged accidentally, through media failure or maliciously. The OAIS model refers to this as fixity information.
DSpace Supported
Detail Since version 1.4 DSpace has supported checksum checking via a command line tool. Digital signatures are not supported.
Fedora Supported
Detail As of version 2.2, Fedora supports the addition of a checksum to all digital files and metadata that can be checked by the repository. Digital signatures are not supported.
Name Supports backup and restore
Detail The repository should be easily restored from backup in the event of a disaster.
DSpace Supported
Detail Information on how to organise backup for a DSpace repository is available.
Fedora Supported
Detail Fedora 2.2 provides a journaling module that allows a repository to be mirrored, or to restore a Fedora repository to the exact state before failure, rather than the state at last backup.
Name Is extensible
Detail The longer-term sustainability of the system will be reliant on its modularity. Monolithic systems are not easily updated to accommodate new needs, while modular systems can be enhanced piecemeal.
DSpace Supported
Detail Supports add-ons; DSpace has rules for 'well-behaved add-ons', but the community has acknowledged that this design should be changed; the architecture group is therefore recommending the adoption of an open source extension framework in version 2 of DSpace.
Fedora Partial support
Detail The repository software and related services can be distributed over different hardware. Additional homegrown or externally sourced services may be added to the Fedora framework.
Name Is scalable
Detail At present, the volume of born-digital archives relative to their paper counterparts is small. This balance will change over time and archives can expect to receive greater quantities of digital materials in future. The volume of metadata will also increase over time, and migrated versions of objects and emulators with their own metadata may be added to the repository. The repository system should scale to manage millions of digital materials; this requires the repository to have the capacity to manage large quantities of material, to support mass throughput of material when ingesting and exporting, and to support several concurrent processes while maintaining acceptable performance.
DSpace Not supported
Detail DSpace is known to have scalability problems; as is, it may be suitable as a short-term repository. The architecture group working on version 2 of DSpace are aiming to make the software scale to 10 million items and have made recommendations that may improve the architecture of the repository.
Fedora Supported
Detail NSDL have tested Fedora with million objects, and the community is looking to test up to 30 million objects.
Name Supports basic searching
Detail Searching across key metadata fields and ideally full text searching for textual objects will facilitate archivist- and researcher-generated queries.
DSpace Supported
Detail DSpace supports searching for one or more keywords in metadata or extracted full-text and browsing though title, author, date or subject indexes. DSpace uses the Lucene search engine and the search indexes are configurable, enabling customisation of which DSpace metadata fields are indexed.
Fedora Supported
Detail Fedora indexes select system metadata fields and the primary Dublin Core record for each object. The Fedora repository system provides a search interface for both full text and field-specific queries across these metadata fields.

The Gsearch service introduced with version 2.2 augments this with indexing of Fedora FOXML records, including the text contents of datastreams and the results of disseminator calls, searching the index, and the ability to plugin selected search engines, so far Lucene and Zebra.