blog and report to, Archivematica, Curator’s Workbench, work on the Salman Rushdie Papers at Emory University,
Work by Seth Shaw and Ben Goldman in conference presentations on practical approaches to born-digital collections at
the Midwest Archives Conference and the Society of American Archivists. Duke Data Accessioner. Chris Prom’s blog. CIC
electronic records policy guidelines. MetaArchive and ICPSR’s guidelines on development of digital preservation policies.
Specifications of processes/tools/procedures from Archivematica, California Digital Library’s Merritt, MetaArchive, etc.
Publications from NDIIPP, ICPSR, InterPARES, PREMIS, etc.
Workflow Descriptions
A procedure to receive digital images and assign file names according to local directory needs is in place. Scripts for
ingesting ETDs from ProQuest. Ad hoc scripting to structure and ingest research data.
Currently the workflow is very straightforward and is intended to protect the records against loss due to failure of the
information carrier.
One principal driver for us was compliance with the requirements of the Presidential Records Act (PRA). The PRA
gives the Archivist legal custody of all Presidential records at the point of an administration transition. The PRA also
obligates NARA to respond to access requests to those records immediately after we receive custody (public access
requests begin five years after transition; in the first five years we respond to special access requests). To meet both
these circumstances our workflows have to account for the ingest of a large volume of holdings in as short a time frame
as possible while giving us search and access capabilities to support asset-level review and production of copies of the
electronic records for external requesters. Model(s) that were helpful to Preservation staff in developing workflows
for ingest and processing born-digital records was the Open Archival Information reference model and the Digital
Curation Center model. In Still Pictures, we have a multi-page set of basic instructions that cover what processing is,
but essentially we: 1) obtain the digital images from the agency usually by downloading onto media for transfer to
NARA. 2) Once here we make a copy for OPA processing. 3) Processing accession for ERA involves reviewing images
to delete those that are temporary; ensuring unique filenames for images; appending our RG and series designations
to each digital image; when images do not have captions, appending whatever information is available to each image
in a folder; reviewing the metadata to make sure there is a link to the individual images; if caption information is in
header, copying that out into a separate text file if needed. Depending on the condition of the accession, there may
be many other processing steps needed to make it ERA and OPA ready. 4) Go thru the laborious process of ingesting
the accession into ERA. 5) Complete processing for OPA and work with NPOL to get the images uploaded to OPA for
reference use.
Our process is still being developed and tested. Currently it includes the following elements: Capture, metadata/content
extraction. FTKImager to capture disk image, generate disk/file level metadata and checksums, and extract content
directory from disk. BASH shell script to combine and organize disk image and metadata files. File Characterization/
Normalization JHOVE and/or DROID for characterization/validation. FileMerlin to convert/normalize legacy text files.
Adobe Acrobat to migrate text files to PDF/A. Appraisal, organization, and description (akin to traditional archival
processing). Human uses Excel spreadsheet to record appraisal decisions, organize content, and enter descriptive
metadata. Ingest XSLT used on Excel spreadsheet to package the digital files and create Dublin Core .xml metadata files
for ingest into our DSpace repository. Command line batch ingest to DSpace.
Our workflows are not specific to born-digital materials. For electronic records management, we have record schedules
and retention policies that apply equally to analog, digitized, and born-digital records. For the digital repository, we
utilize a workflow management system that enables us to establish collections, develop and document master file
formats, validate and document technical characteristics of files, develop metadata, attach digital files to metadata, and
