46 · Survey Results: Survey Questions and Responses
These challenges from Presidential Libraries are representative of the challenges other parts of NARA also experience.
Volume of data to be ingested in as short a time frame as possible: We receive the vast majority of our electronic
records in large transfers at the end of a Presidential administration. Because of our need to provide asset-level access
to electronic records as soon as the records are in our legal custody we need to ingest these large volumes in as short
a time frame as is possible. In our last large transfer we worked with the records creators and with our system vendor
to devise a means of transfer that employed storage area networks (SANs) to move large volumes (tens of terabytes)
of data copied from the creator’s data center to the data center for our Electronic Records Archives (ERA). Four physical
shipments of data stored on SANs over the course of several months moved more than 70TB of data from the source
data center to our data center, where the files could be staged for ingest and then moved into our system environment.
File-level access control policy: Our system users are located across the country. All users fill the same role in the
system, but users should have access to only subsets of the electronic records maintained in the system (Presidential
records from one administration versus Vice Presidential records from another administration, for instance). To maintain
asset-level access control (among other needs) we established asset catalog entries (ACEs) that were assigned to each
asset upon ingest. These ACEs (xml files) include elements that define each asset by a Presidential administration and
by a records status (Presidential, Vice Presidential, or Federal). When users log in to Executive Office of the President
instance of the Electronic Records Archives (EOP ERA) the system is able to compare the rights of the user to the
characteristics of assets to determine if the user can have access to the files. Need to make electronic message files
accessible: The storage architecture deployed in EOP ERA makes hundreds of formats available for indexing, including
.eml files for emails. One set of electronic messages planned for transfer to us during the last transition (more than 20
million files) was stored in a journal format that maintained the messages as text files. Because we wanted to access
the messages as emails (i.e., using parametric searches of email fields – To, From, Date, etc.) our vendor (Lockheed
Martin) developed a script that transformed the text files into discrete .eml files that could be ingested into EOP ERA
and managed as email files. As part of this transformation process the vendor used sample data to inform a discussion
with our archivists on the fields we wanted to maintain in the .eml target files. As part of testing we were able to assure
ourselves that the content of the messages came through the transformation intact, including any files attached to the
original message files.
There is no Digital Asset Management System (DAMS) in place to ingest born-digital material. System wide initiatives
would address this problem. The necessary hardware to transfer born-digital material from legacy media is not available
at our repository. A few pieces of legacy hardware have been purchased. Staff expertise to deal with ingesting born-
digital materials is limited. This has not yet been addressed.
Time: Reformatting legacy media, and arranging and describing born-digital content, are time-consuming activities.
The volume of data that can be found within a single item such as a hard drive can be staggering. Migrating content
from legacy media is also time consuming as there is little automation/batch handling of these materials. We are
investigating ways in which to reduce time spent on individual items. Migrating unidentified content: With unidentified
content on an obsolete media format it’s difficult to determine whether the content is a reformatting priority without
accessing the material. If we do not have the equipment in-house for the obsolete media format the item requires
access by a vendor. Sending an item out to a vendor is expensive and may not be the best use of our resources. At this
point, we are investigating ways to address this issue without overuse of resources. Software licensing: Due to stringent
state regulations on software purchasing and needing obsolete software titles to access files that may be generations
removed from current software (or without a contemporary equivalent) acquiring appropriate software necessary for file
migration is a challenge. We are looking into software titles that can bridge generations that is, software that can open
older files and convert them to a newer generation that can be accessed with current software. We are also examining
software designed to open obsolete file formats such as Quick View Pro.
These challenges from Presidential Libraries are representative of the challenges other parts of NARA also experience.
Volume of data to be ingested in as short a time frame as possible: We receive the vast majority of our electronic
records in large transfers at the end of a Presidential administration. Because of our need to provide asset-level access
to electronic records as soon as the records are in our legal custody we need to ingest these large volumes in as short
a time frame as is possible. In our last large transfer we worked with the records creators and with our system vendor
to devise a means of transfer that employed storage area networks (SANs) to move large volumes (tens of terabytes)
of data copied from the creator’s data center to the data center for our Electronic Records Archives (ERA). Four physical
shipments of data stored on SANs over the course of several months moved more than 70TB of data from the source
data center to our data center, where the files could be staged for ingest and then moved into our system environment.
File-level access control policy: Our system users are located across the country. All users fill the same role in the
system, but users should have access to only subsets of the electronic records maintained in the system (Presidential
records from one administration versus Vice Presidential records from another administration, for instance). To maintain
asset-level access control (among other needs) we established asset catalog entries (ACEs) that were assigned to each
asset upon ingest. These ACEs (xml files) include elements that define each asset by a Presidential administration and
by a records status (Presidential, Vice Presidential, or Federal). When users log in to Executive Office of the President
instance of the Electronic Records Archives (EOP ERA) the system is able to compare the rights of the user to the
characteristics of assets to determine if the user can have access to the files. Need to make electronic message files
accessible: The storage architecture deployed in EOP ERA makes hundreds of formats available for indexing, including
.eml files for emails. One set of electronic messages planned for transfer to us during the last transition (more than 20
million files) was stored in a journal format that maintained the messages as text files. Because we wanted to access
the messages as emails (i.e., using parametric searches of email fields – To, From, Date, etc.) our vendor (Lockheed
Martin) developed a script that transformed the text files into discrete .eml files that could be ingested into EOP ERA
and managed as email files. As part of this transformation process the vendor used sample data to inform a discussion
with our archivists on the fields we wanted to maintain in the .eml target files. As part of testing we were able to assure
ourselves that the content of the messages came through the transformation intact, including any files attached to the
original message files.
There is no Digital Asset Management System (DAMS) in place to ingest born-digital material. System wide initiatives
would address this problem. The necessary hardware to transfer born-digital material from legacy media is not available
at our repository. A few pieces of legacy hardware have been purchased. Staff expertise to deal with ingesting born-
digital materials is limited. This has not yet been addressed.
Time: Reformatting legacy media, and arranging and describing born-digital content, are time-consuming activities.
The volume of data that can be found within a single item such as a hard drive can be staggering. Migrating content
from legacy media is also time consuming as there is little automation/batch handling of these materials. We are
investigating ways in which to reduce time spent on individual items. Migrating unidentified content: With unidentified
content on an obsolete media format it’s difficult to determine whether the content is a reformatting priority without
accessing the material. If we do not have the equipment in-house for the obsolete media format the item requires
access by a vendor. Sending an item out to a vendor is expensive and may not be the best use of our resources. At this
point, we are investigating ways to address this issue without overuse of resources. Software licensing: Due to stringent
state regulations on software purchasing and needing obsolete software titles to access files that may be generations
removed from current software (or without a contemporary equivalent) acquiring appropriate software necessary for file
migration is a challenge. We are looking into software titles that can bridge generations that is, software that can open
older files and convert them to a newer generation that can be accessed with current software. We are also examining
software designed to open obsolete file formats such as Quick View Pro.