SPEC Kit 329: Managing Born-Digital Special Collections and Archival Materials · 39
Ingest Challenges
13. Please briefly describe up to three challenge(s) your library has faced in ingesting born-digital
materials (e.g., file format, hardware, software, privacy or security issues, etc.) and how the library
has addressed that challenge. N=60
Ingest Challenges Word Cloud
A significant challenge has to do with legacy media. The oldest format we have so far identified are 8” floppy disks in
a WANG format (i.e., not a modern PC format). We also have identified a number of 5.25” and 3.5” floppy disks, as
well as CD, DVD, and hard drive formats. We suspect that there also may be data tape formats in the stacks as well (an
inventory project is currently underway), and have been in talks with a donor with data on IOMEGA jaz and zip disks.
We have been able to acquire media to read the 5.25” and 3.5” floppy formats, but not the 8” formats. We have also
acquired a forensic imaging device that can handle a number of data connector types such as SCSI, SATA, and IDE.
When we are able to physically read a disk, some of these devices have controller cards or software that work only on
specific operating systems, making it difficult sometimes to physically read disks in an efficient workflow (because some
disks in a collection are read with one machine and one software, producing one kind of output file, while others need
to be read on another machine using a different software and output file). For those pieces of media that we currently
do not have hardware for, we have to evaluate the costs of acquiring the hardware against the value. For example,
we could possibly find drives to read the 8” disks, but we are not guaranteed to find a controller card to make that
drive compatible with a modern PC anyway. In this circumstance, we have therefore investigated the prices of having a
vendor image the disks for us, but this also requires a cost-benefit analysis. Each disk contains less than 80 KB of data
and the price just to image the disk is around $50 each. In addition, we need to ship the disks and we run the risk of
having them lost or damaged in transport. As well, we would need to have some sort of confidentiality agreement with
the vendor regarding the privacy of the data because we have no real idea of what is on the disk. After all of that, we
could then send these disks to the vendor and find that they are unreadable anyway – there is no way to tell if a disk
is readable until you attempt to read it. Finally, the issue of the scale of this legacy media is a challenge. We estimate
currently that we hold less than 3,000 disks, but the time necessary just to load and transfer data from those disks is
considerable. This is not even including the time it would take to process the materials as part of a collection, but simply
to transfer it from the media where it is at a higher risk of corruption, to network storage. A second challenge related to
the first, but separate, is the ability to actually read the data that is on the media. As described above, we have some
material that is so old that it is not even readable by a modern computing system. Other data is not as old, but still
Ingest Challenges
13. Please briefly describe up to three challenge(s) your library has faced in ingesting born-digital
materials (e.g., file format, hardware, software, privacy or security issues, etc.) and how the library
has addressed that challenge. N=60
Ingest Challenges Word Cloud
A significant challenge has to do with legacy media. The oldest format we have so far identified are 8” floppy disks in
a WANG format (i.e., not a modern PC format). We also have identified a number of 5.25” and 3.5” floppy disks, as
well as CD, DVD, and hard drive formats. We suspect that there also may be data tape formats in the stacks as well (an
inventory project is currently underway), and have been in talks with a donor with data on IOMEGA jaz and zip disks.
We have been able to acquire media to read the 5.25” and 3.5” floppy formats, but not the 8” formats. We have also
acquired a forensic imaging device that can handle a number of data connector types such as SCSI, SATA, and IDE.
When we are able to physically read a disk, some of these devices have controller cards or software that work only on
specific operating systems, making it difficult sometimes to physically read disks in an efficient workflow (because some
disks in a collection are read with one machine and one software, producing one kind of output file, while others need
to be read on another machine using a different software and output file). For those pieces of media that we currently
do not have hardware for, we have to evaluate the costs of acquiring the hardware against the value. For example,
we could possibly find drives to read the 8” disks, but we are not guaranteed to find a controller card to make that
drive compatible with a modern PC anyway. In this circumstance, we have therefore investigated the prices of having a
vendor image the disks for us, but this also requires a cost-benefit analysis. Each disk contains less than 80 KB of data
and the price just to image the disk is around $50 each. In addition, we need to ship the disks and we run the risk of
having them lost or damaged in transport. As well, we would need to have some sort of confidentiality agreement with
the vendor regarding the privacy of the data because we have no real idea of what is on the disk. After all of that, we
could then send these disks to the vendor and find that they are unreadable anyway – there is no way to tell if a disk
is readable until you attempt to read it. Finally, the issue of the scale of this legacy media is a challenge. We estimate
currently that we hold less than 3,000 disks, but the time necessary just to load and transfer data from those disks is
considerable. This is not even including the time it would take to process the materials as part of a collection, but simply
to transfer it from the media where it is at a higher risk of corruption, to network storage. A second challenge related to
the first, but separate, is the ability to actually read the data that is on the media. As described above, we have some
material that is so old that it is not even readable by a modern computing system. Other data is not as old, but still