SPEC Kit 329: Managing Born-Digital Special Collections and Archival Materials · 65
The main challenge is that our existing data that we use to inform storage needs are based on the creation of image
collections. Born-digital materials have the potential to be exponentially larger in terms of storage requirements.
Distributed storage environments. We have not yet identified a sustainable way to hook our repository services up to
other campus storage environments for the purpose of linking and ingestion. Estimating the costs to maintain storage
for the long term, including curation and migration costs.
The most significant challenge for storage is the same as one of the challenges for ingest: the lack of an infrastructure
of repositories and tools to store and maintain these kinds of materials. While many parts of the process can be handled
by established tools, other parts can’t or systems don’t work together. So, as an example, while we have preservation
storage for digital masters of digitized images that can also be used for storage of born-digital materials, this is dark
storage and it doesn’t meet the need for access and discovery of materials. Similarly, the repository for access and
discovery has been designed so far for individual images, e-books, and A/V, not for heterogeneous groups of materials
in a manuscript collection that may be described only at an aggregate level in a finding aid. The only way to address
this challenge is to develop the infrastructure further and adopt and adapt emerging tools for parts of this process.
Related to this is coordination of resources to address these needs. While we, like everyone else, can always use more
staff and more funding, just utilizing the staff that we have to address these needs while continuing to address existing
needs is a concern. In addition, the infrastructure and workflows created to address these materials cannot exist in a
vacuum, they must be compatible with or must be an extension of the infrastructure that manages data about the rest
of the library’s collections. This means that progress on developing infrastructure in support of born-digital materials
must include input, buy-in, and resources from many parts of the library organization: special collections, library IT,
administration, technical services, etc. Increasing meaningful communication between groups and jointly planning
development is the best way to address these challenges. Another challenge for storage is determining what to store
and what metadata to store about it. While a default option is to store a bit-level copy for long-term preservation, some
work has been done to determine what other levels of preservation can be supported and what data would need to be
stored in order to enable this level of preservation. A significant challenge relates to the retention of private or sensitive
information. Given the nature of the archival workflow, we really do not have time to completely process materials
before we put them in archival storage. This means that private or sensitive information may be inadvertently stored
for some time. We do have some tools we can use during accessioning to automatically search for significant patterns
such as SSNs and social security numbers within textual data, but we do not have the time to do any more in-depth
searching. While this mirrors the situation with paper records (although, we can actually remove more potentially private
information during accessioning with born-digital textual materials than with paper), the risk is much greater for loss
of security of this information in the digital environment. An additional issue that we are still considering is how to (or,
indeed, whether to) securely dispose of media carriers (disks) that continue to store sensitive data even after a copy
has been retrieved from them. The issues are that, in some cases the media carrier itself may retain artifactual value
(hand-written annotations, modifications, metadata contained on labels, etc.), if the copy made was corrupt or lost the
media can serve as a back-up, and that completely wiping hardware is difficult to do. The recommended options for
destruction and deletion of the data are potentially costly and time-consuming (disk shredding, magnetic wiping). To
address these challenges we have undertaken a number of activities and are still discussing other solutions. One major
step was consulting the library’s legal counsel for advice on adhering to university, state, and federal regulations in the
handling and storage of this data. Other workflow issues, such as the screening of content for sensitive information at
the accessioning stage using automated methods, have also been added to the workflow.
UCISpace Fixity: until recently the material ingested into UCISpace (local DSpace instance) was not being continually
checked for fixity/authenticity. We now run a checksum checker on all UCISpace content nightly and are in the final
stages of implementing a system that will back up DSpace generated AIPs of all UCISpace material into CDL’s Merritt
repository. This Merritt collection will serve as a geographically separate dark archive that we can also access to replace
lost or corrupted items/collections if and when the checksum checker discovers them. Canto Cumulus: a robust digital
Previous Page Next Page