138 · Representative Documents: Format Policies
RUcore. Archival Standards for Born-Digital Documents
IBB • RUcore Preservation Standards • Born Digital Documents Rev: 8/9/2010
Page 2 of 3
resulted in a trickle-down effect to the consumer level on home computers and in academia as well. MS
Office isn’t perfect, however. The file formats used by Microsoft have evolved over the years as new
versions have been released, and inconsistencies exist between versions in how document formatting is
At present, there are a number of formats developed by various consortia that attempt to solve the
problem of maintaining a persistent document standard, and Microsoft itself has sought to modernize
and make their document formats a formally accepted industry standard. Some of the more prevalent
• OpenXML: A standard developed and endorsed by Microsoft and a consortium of other
commercial software vendors, and is the standard document format used in the Microsoft Office
suite beginning with Office 2007. These documents are often recognizable by their .docx, xlsx,
and .pptx extensions.
• OASIS OpenDocument (ODF): An existing, open standard for file formats in use primarily in
open source and “non-Microsoft” environments. These file formats are the default for
OpenOffice.org and similar Free Software alternatives.
• Portable Document Format/Archival (PDF and PDF/A): A well-established standard with
roots in Adobe PDF, a subset of which is now an ISO standard and a Library of Congress
recognized format for digital document preservation.
There is also significant prevalence of legacy standards, a majority of which consists of legacy MS
Office document types (.doc, .xls, .ppt, etc.) as well as more complex file formats for more intricate or
specialized document types (LaTeX, Adobe InDesign, Illustrator, etc.). And finally, there are a
multitude of document authoring platforms that are currently supported but have smaller market shares,
such as Apple’s iWork, current versions of Corel WordPerfect
Our choice of standards are based the ability to endure as technological advances continue to develop,
and a widespread acceptance is key to ensuring easy migrating to newer standards when the time comes
to retire existing choices.
The Recommendation: Our best case to preserve born digital documents while retaining longevity
Considering the state of the born digital document landscape as outline above, it is thus advisable that
more than one preservation datastream for born-digital objects is utilized when possible. This strategy
permits us to build redundancy into our repository, and ensure that regardless of whether one standard
“wins out” over the other, our objects will remain with at least one relevant archival datastream. With
that in mind, our strategy can be outlined as follows:
1. Store the original document in its native format when possible.
In most cases, this will be an MS Office document, or a file from a similarly well-known
software package. In some instances, the document we receive may already be rendered as a
PDF file, in which case Step 2 below may not be necessary.
2. Store an additional surrogate master in the form of a PDF/Archival file.
Most modern document authoring software, including MS Office and OpenOffice.org, have a