Columbia University
Columbia’s Long-Term Digital Preservation Archive (LTA). Status Report
CUL/IS Long Term Digital Preservation
Columbia University Libraries Digital Program
Columbia's Long-Term Digital
Preservation Archive (LTA)
Columbia University Libraries/Information Services
Digital Preservation & Asset Management Infrastructure: Status Report
Columbia's Long-Term Digital Preservation Archive is a key component of Columbia's Digital Library
and Institutional Repository infrastructure. It consists of a robust asset management system that can
manage the digital resources of Columbia Libraries/Information Services for a variety of applications
and at the same time provide the set of feature and services needed for long-term preservation of
relevant digital assets. The need for a comprehensive architecture was identified when the new Digital
Programs and Technology Services group (DPTS) was created in July 2007. Planning for the storage
system began in the first quarter of 2008; implementation began in 2009.
Technical Architecture
The technical architecture has been designed with four main components:
1. Digital preservation storage system
2. Fedora software platform
3. Application and authentication middleware
4. Applications to support the Long Term Digital Preservation Achive and other programs
1. Digital Preservation Storage System
CUL/IS stores digital preservation assets on a total of four copies, two on disk and two on tape. One
copy on disk and a second copy on tape will be located in an automated system in Columbia’s main
data center. A third copy on disk is located in the NYSERNet Data Center located in Syracuse, New
York. A fourth copy on offline tape is sent to Iron Mountain to provide an additional offsite location.
To manage multiple copies, automate migration and replication and provide a policy-based model to
manage the long-term retention and access to digital assets, CUL/IS chose the Sun StorageTek
Storage Archive Manager (SAM) software along with Sun hardware as a single vendor solution. SAM is
“tried and true,” with over a decade of proven use in managing large data repositories at corporations,
supercomputer centers and libraries. It provides a self-protecting, automated data migration and
recovery model that enables us to populate and incrementally expand the preservation storage to
meet current and future needs. To support long-term sustainability and end-of-life data migration,
SAM uses portable, nonproprietary data formats to store data on disk, the source code has been
published as open source and uses open standards to provide data retrieval and access.
A total of 280 terabytes (TB) of disk and tape storage has been purchased to support the Digital
Preservation Storage System. After this storage has been configured to support four copies of the
digital assets, the system will have an effective storage capacity of approximately 70TB. As
purchased, the system may be expanded incrementally to an effective storage capacity of up to
400TB. A high-speed, 10TB local disk cache provides increased access performance for commonly
accessed digital assets and ensures that CUL/IS can rapidly load the system as required by large
digital preservation efforts.
2. Fedora Software Platform
CUL/IS has chosen the Fedora Commons software platform to manage Columbia’s digital repository,
long term archive and a variety of other applications. Fedora version 3 has been installed on CUL/IS
production servers, managed by Columbia University's central IT group. Fedora has been configured in
