Workbook on Digital Private Papers > Digital repositories > Comparing repository software for preserving personal digital archives
Comparing repository software for preserving personal digital archives
Archival concerns
The care of material in a repository catering for born-digital personal archives is the duty of professional archival staff. Archivists must be able to understand and have faith in the system's security and its processes, and be able to interact with the system confidently. Archivists have a duty of care to ensure the authenticity, continuing availability and robustness of archival material, both to the creators of archival material and the researchers who will use it, and to support the eventual use of archival material by researchers and to satisfy freedom of information requests.
| Name |
Supports audit trails |
| Detail |
Born-digital personal archives will be retained by their collecting institutions indefinitely, and it can be assumed that during this time items will be moved to new storage media and new formats on numerous occasions. The repository must provide mechanisms to demonstrate that an item is as it was when submitted to the repository - that it is authentic. |
| DSpace |
Partial support |
| Detail |
Some activities, such as submitting and approving a bitstream, are recorded as qualified Dublin Core metadata using description provenance - this records the name, the date and time, filenames, size in bytes and an MD5 checksum. |
| Fedora |
Partial support |
| Detail |
Modifications to files or metadata are logged in Fedora's audit metadata, which records information about who did what and when, and associates it with the object. Metadata associated with migration or preservation events are not created by Fedora, though Fedora could support the addition of such metadata. |
| Name |
Supports unique identification of metadata, digital files and conceptual objects. |
| Detail |
In order to maintain intellectual control of items in the repository it must be possible to apply unique identifiers to objects and metadata managed by the repository. |
| DSpace |
Supported |
| Detail |
Each Community, Collection and Item is allocated a Handle in the current version of DSpace. In version 2 DSpace will support other persistent identifiers in addition to Handles, and it will be possible to apply identifiers at more granular levels. |
| Fedora |
Supported |
| Detail |
Each object, file and metadata (and version thereof) is given a unique identifier by the Fedora repository. Repositories may opt to use the Handle system with Fedora. For example, see the VTLS OSC suite of tools, which includes a service for integrating the Handle System with Fedora. |
| Name |
Supports reliable binding of metadata and digital object |
| Detail |
In order to maintain intellectual control of items in the repository it must be possible to permanently associate an archival item with its metadata, both within the repository and in any export functionality. |
| DSpace |
Partial support |
| Detail |
Each Community, Collection and Item can have its own metadata. Individual files which make up an item are allocated basic metadata, which is displayed by the containing Item's metadata; it is unclear which metadata belongs to which file. Recommendations for version 2 of DSpace include allowing metadata at more granular levels. |
| Fedora |
Partial support |
| Detail |
Dependent on implementation. If multiple files and their metadata are stored within a single object wrapper, then the repository must itself implement conventions which specify which metadata belongs to which files. If repositories use an atomistic model, with one file and its metadata to an object, metadata and object are unambiguously connected. |
| Name |
Supports referenced metadata |
| Detail |
Some metadata is applicable to several objects and is best held once, such as an EAD collection level archival description, or rights metadata. Other metadata, such as file format registries, may be curated in repositories external to the organisation. |
| DSpace |
Not supported |
| Detail |
|
| Fedora |
Supported |
| Detail |
Files and metadata may be held outside of the repository and referred to; relationships between objects in the repository can also be formed. |
| Name |
Supports complex inter-object relationships |
| Detail |
Meaning in archival materials relies heavily on context; it is necessary that the repository supports complex hierarchical relationships found in archives. |
| DSpace |
Not supported |
| Detail |
The DSpace data model is designed for flatter collections and is not well-suited to complex structures. |
| Fedora |
Supported |
| Detail |
Fedora can support complex multi-level relationships through its RDF metadata. It is also possible to ingest METS structural maps to reflect the original order of an archival accession, to ensure that this is preserved for the archivist who will catalogue the archive. |
| Name |
Supports appropriate metadata standards |
| Detail |
Support for open and widely adopted metadata standards increases object portability, tool availability and the likelihood of recruiting staff familiar with metadata employed by the repository. Support for PREMIS preservation metadata has not been incorporated into any repository yet, and support for technical metadata is very limited. |
| DSpace |
Partial support |
| Detail |
METS, OAI-PMH and Dublin Core are supported. Additional metadata may be added as 'serialized datastreams'. |
| Fedora |
Partial support |
| Detail |
Fedora stores metadata in its native FOXML, which can be exported to METS (a Fedora extension of METS); it also supports OAI-PMH and Dublin Core. Fedora can store any kind of valid XML metadata and can be configured to index this metadata using the Fedora Generic Search Service.
Metadata extraction support is limited to a web service for the Jhove validation and technical metadata extraction. There are currently no tools to generate or act on PREMIS preservation metadata.
Relationships between objects can be recorded using METS structural maps or via RDF metadata, but Fedora provides no interfacing ith those relationships (e.g. would not display a complex object). |
| Name |
Supports simple and complex objects |
| Detail |
Personal digital archives contain a range of simple objects, consisting of a single file, and complex objects, which are composed of multiple files that must be reassembled to recreate the object. The repository should be capable of supporting both kinds of object. |
| DSpace |
Partial support |
| Detail |
Allows multiple files to be bundled together in an Item, but this limits the metadata that can be applied. |
| Fedora |
Supported |
| Detail |
The Fedora data model allows users to bundle files together in an object, or to store files in their own objects and create relationships between them. |
| Name |
Supports multiple types and formats |
| Detail |
Personal digital archives can contain a wide variety of material, from email to simple image files, from spreadsheets to word-processed documents, from websites to audio files. The repository should be capable of supporting a wide range of object types and formats. |
| DSpace |
Supported |
| Detail |
DSpace has a bitstream registry which details the formats that the repository accepts, and the level of support the repository provides for them. Additional formats may be added to the registry. |
| Fedora |
Supported |
| Detail |
Supports any mime-type. |
| Name |
Supports automatic metadata creation |
| Detail |
The preservation of born-digital archives requires a great deal of metadata. The automation of this metadata is extremely advantageous. |
| DSpace |
Partial support |
| Detail |
Some audit metadata, etc., is created automatically. Much of the metadata must be input through the web user interface. |
| Fedora |
Partial support |
| Detail |
Audit metadata is created automatically, and checksum metadata may be created automatically. A Jhove Metadata Extraction Service is available to add some technical metadata. The SIP Creator/Dir Ingest service can automate the creation of relationship metadata. Much metadata, including descriptive and preservation metadata, must be compiled manually. |
| Name |
Supports bulk ingest |
| Detail |
Digital materials must be properly ingested into a managed environment as soon as possible, bulk ingest is therefore highly desirable. |
| DSpace |
Supported |
| Detail |
Provides a command-line bulk ingest tool; files must be arranged according to a specified hierarchy to map to the DSpace data model. |
| Fedora |
Supported |
| Detail |
The Fedora Management web service has SOAP-based operations to ingest digital objects in different XML wrapper formats (METS and FOXML). This same web service has other SOAP-based operations to add datastream content to an object that is already in the Fedora repository. Fedora also has a separate “Directory Ingest” service that runs as a web application; this service accepts a zip file that contains a hierarchical directory of files along with a METS manifest file, opens the zip file and calls the Fedora Management web service to ingest each file as a digital object, preserving the hierarchical directory relationships. |
| Name |
Supports bulk export |
| Detail |
Bulk export will be necessary for an institution moving to another repository technology, or one returning deposited materials to a creator. Archival materials and their metadata are likely to be moved to the next version of the repository software, and beyond that will one day be migrated to an entirely new system. It should be possible to easily migrate objects and metadata, and preference should therefore be given to implementations of metadata standards which are open and widely adopted. |
| DSpace |
Supported |
| Detail |
Provides a command-line tool (dspace-export) that outputs a METS file per collection with references to the digital files (called bitstreams by DSpace) in the collection. DSpace can also export in the DSpace ingest format. |
| Fedora |
Supported |
| Detail |
From the GUI client, command-line or through a homegrown SOAP client. |
| Name |
Supports appropriate content models |
| Detail |
Content models allow repositories to specify how particular classes of object should be treated. This increases efficiency and quality. |
| DSpace |
Not supported |
| Detail |
The DSpace content model is rigid, and characterised by the Community and collection concepts of a repository for academic output. |
| Fedora |
Partial support |
| Detail |
Fedora allows the user to define their own content models. Work on formalising content models, including defining a content model definition language, is underway. |
| Name |
Supports format identification |
| Detail |
Reliably identifies an object as being of a particular format and assigns this metadata. |
| DSpace |
Not supported |
| Detail |
Objects are associated with a format manually. The permitted bitstream formats recognised by the system are stored in the bitstream format registry. The contents of the bitstream format registry are entirely user-defined, though the system requires that the two default formats are present (Unknown and License). |
| Fedora |
Not supported |
| Detail |
Datastreams are manually associated with a mime type and optionally a format URI (this is a user-assigned URI which supports identification of the media type of an object in a more specific way than using a MIME type). |
| Name |
Supports file validation |
| Detail |
Validates an object against a specification to evaluate its correctness and completeness. |
| DSpace |
Not supported |
| Detail |
A command-line tool to run Jhove over the DSpace asset store has been developed by the DSpace community. |
| Fedora |
Not supported |
| Detail |
Use of the Jhove tool in conjunction with Fedora provides validation for some formats. |
| Name |
Supports versioning |
| Detail |
Allows the repository to keep older versions of metadata and files. |
| DSpace |
Not supported |
| Detail |
The proposed changes to come in version 2 of DSpace will introduce versioning and the concept of Manifestations for Items, which may have their own metadata records. |
| Fedora |
Supported |
| Detail |
As of version 2.2, Fedora allows users to decide whether each metadata or digital file is versionable, or whether older versions should be overwritten by newer versions. For datastreams or metadata that are versionable, changes result in a new timestamped version being created. Older versions remain accessible. |
| Name |
Easy to use workflows |
| Detail |
Archivists must work with the repository in order to apply professional treatment to the processing of these assets. It is important that repository interfaces support use by less technical users. |
| DSpace |
Partial support |
| Detail |
Provides ingest workflow via a web user interface for non-technical users. The architecture group has proposed that version 2 of DSpace support a wider variety of workflows, which go beyond initial ingest and include migration, versioning and export and that these should be configured by users through interfaces provided by DSpace. The DSpace community are also evaluating workflow engines. |
| Fedora |
Not supported |
| Detail |
Fedora's design anticipates the creation of a workflow outside of the repository. It provides a basic client which is usable (with training) for working on single items, but the open source workflow interfaces designed by other Fedora users (such as Fez and Elated) do not meet the processing requirements for archival materials. |
| Name |
Supports appropriate security mechanisms |
| Detail |
Born-digital archives will often be subject to embargo for a number of years owing to privacy and other concerns. Once privacy concerns cease, copyright still influences the manner in which the archives may be used. Security is of the utmost importance in building the confidence of potential donors; a security breach could be disastrous for the reputation of an archival repository and could have serious implications for collection development. |
| DSpace |
Supported |
| Detail |
Provides data transfer encryption (SSL).
Authenticates users via a web user interface or LDAP.
Supports different user accounts and roles, and has a web interface for editing permission policies.
From version 2 Epeople (DSpace terminology for users) will have persistent identifiers in the form of URIs.
Direct access to Java API, database and filesystem requires user privileges on the machine hosting the DSpace repository. |
| Fedora |
Supported |
| Detail |
See Fedora's security documentation.
Can restrict access to Management and Access APIs based on IP address.
Management API is protected by basic HTTP authentication.
Can provide data transfer encryption (SSL).
Can create multiple users (with roles and permissions that can be used in XACML access policies) in fedora-users.xml file; by default supports a single known user (fedoraAdmin) and other users are anonymous. Multiple users are needed for audit trail purposes.
Can defer authentication to application; Fedora therefore authenticates the application and expects the application to undertake user authentication.
XACML can be used to define repository level policies and item-level policies. Policies can be very granular, e.g. restricting access to a file but allowing metadata access.
Repository administrators are expected to provide the storage locations of metadata and content objects with adequate security.
As of v 2.2 Fedora can authenticate users against an LDAP server. |
| Name |
Supports technology watch |
| Detail |
A digital repository of personal digital archives will contain multiple material types which are submitted in a variety of different formats. It will be necessary to automate some technology watch functions to monitor the status of the materials in the archive so that preservation actions can be planned, prioritised and implemented as necessary. The repository should alert administrators to file formats which are at risk of obsolescence. |
| DSpace |
Not supported |
| Detail |
An event mechanism has been proposed for version 2 of DSpace and the current EventMechanism prototype being worked on for version 1.5 might provide a basis to meet this requirement. |
| Fedora |
Not supported |
| Detail |
A preservation monitoring service (based on event notification) is planned. |
| Name |
Supports notification of objects due for review, or opening for research |
| Detail |
The repository should notify the administrator when objects can be made accessible to researchers, or when their status should be reviewed. |
| DSpace |
Not supported |
| Detail |
An event mechanism has been proposed for version 2 of DSpace and the current Eventmechanism prototype being worked on for version 1.5 might provide a basis to meet this requirement. |
| Fedora |
Not supported |
| Detail |
If the planned event notification service materialises this might satisfy this requirement. |
| Name |
Provides reporting features |
| Detail |
The repository should be able to generate statistics that would be useful for planning and prioritising preservation strategies. One such report might be on the file formats represented in the repository. It should also be able to provide useful statistical information, such as the quantity and quality of material ingested into the repository in a given period. |
| DSpace |
Partial support |
| Detail |
Some statistical reports can be generated by analysing DSpace's log files. |
| Fedora |
Not supported |
| Detail |
The features documentation alludes to a reporting utility which does not appear to exist? |
| Name |
Supports digital provenance metadata |
| Detail |
The repository should allow users to trace migrated objects back to the original submission, with an account of the object's migration history. |
| DSpace |
Partial support (experimental) |
| Detail |
The History subsystem (referred to at the DSpace Sourceforge website) is explicitly invoked when significant events occur (e.g., accepting an item into the archive). The functionality of this part of DSpace is documented as a largely untested experiment. A replacement for inclusion in version 1.5 is being worked on. |
| Fedora |
Supported |
| Detail |
As of version 2.2, Fedora supports journaling alongside the existing auditing and versioning functionality. There is no explicit functionality though to provide an account history at present and how the digital provenance metadata could be used would be dependent on the content model used. |
| Name |
Supports digital provenance metadata |
| Detail |
The repository should allow users to trace migrated objects back to the original submission, with an account of the object's migration history. |
| DSpace |
Partial support (experimental) |
| Detail |
The History subsystem (referred to at the DSpace Sourceforge website1) is explicitly invoked when significant events occur (e.g., accepting an item into the archive). The functionality of this part of DSpace is documented as a largely untested experiment. A replacement for inclusion in version 1.5 is being worked on. |
| Fedora |
Supported |
| Detail |
As of version 2.2, Fedora supports journaling alongside the existing auditing and versioning functionality. There is no explicit functionality though to provide an account history at present and how the digital provenance metadata could be used would be dependent on the content model used. |
| Name |
Supports integrity monitoring for metadata and objects |
| Detail |
The repository should monitor digital objects and metadata to ensure that they have not been damaged accidentally, through media failure or maliciously. The OAIS model refers to this as fixity information. |
| DSpace |
Supported |
| Detail |
Since version 1.4 DSpace has supported checksum checking via a command line tool. Digital signatures are not supported. |
| Fedora |
Supported |
| Detail |
As of version 2.2, Fedora supports the addition of a checksum to all digital files and metadata that can be checked by the repository. Digital signatures are not supported. |
| Name |
Is extensible |
| Detail |
The longer-term sustainability of the system will be reliant on its modularity. Monolithic systems are not easily updated to accommodate new needs, while modular systems can be enhanced piecemeal. |
| DSpace |
Supported |
| Detail |
Supports add-ons; DSpace has rules for 'well-behaved add-ons', but the community has acknowledged that this design should be changed; the architecture group is therefore recommending the adoption of an open source extension framework in version 2 of DSpace. |
| Fedora |
Partial support |
| Detail |
The repository software and related services can be distributed over different hardware. Additional homegrown or externally sourced services may be added to the Fedora framework. |
| Name |
Is scalable |
| Detail |
At present, the volume of born-digital archives relative to their paper counterparts is small. This balance will change over time and archives can expect to receive greater quantities of digital materials in future. The volume of metadata will also increase over time, and migrated versions of objects and emulators with their own metadata may be added to the repository. The repository system should scale to manage millions of digital materials; this requires the repository to have the capacity to manage large quantities of material, to support mass throughput of material when ingesting and exporting, and to support several concurrent processes while maintaining acceptable performance. |
| DSpace |
Not supported |
| Detail |
DSpace is known to have scalability problems; as is, it may be suitable as a short-term repository. The architecture group working on version 2 of DSpace are aiming to make the software scale to 10 million items and have made recommendations that may improve the architecture of the repository. |
| Fedora |
Supported |
| Detail |
NSDL have tested Fedora with million objects, and the community is looking to test up to 30 million objects. |
| Name |
Supports basic searching |
| Detail |
Searching across key metadata fields and ideally full text searching for textual objects will facilitate archivist- and researcher-generated queries. |
| DSpace |
Supported |
| Detail |
DSpace supports searching for one or more keywords in metadata or extracted full-text and browsing though title, author, date or subject indexes. DSpace uses the Lucene search engine and the search indexes are configurable, enabling customisation of which DSpace metadata fields are indexed. |
| Fedora |
Supported |
| Detail |
Fedora indexes select system metadata fields and the primary Dublin Core record for each object. The Fedora repository system provides a search interface for both full text and field-specific queries across these metadata fields.
The Gsearch service introduced with version 2.2 augments this with indexing of Fedora FOXML records, including the text contents of datastreams and the results of disseminator calls, searching the index, and the ability to plugin selected search engines, so far Lucene and Zebra. |