The need to convert data from analog to digital formats will depend on the assessed value of the
particular data set in question. Data cleaning and deidentification are very important curation activities
but should be performed by the data owners rather than the library curation staff.
We believe all this is important, just not things the LIBRARY needs to do or should do.
Here are descriptions of ten more data curation processing and review activities.
File Inventory or Manifest: The data files are inspected periodically and the number, file types
(extensions), and file sizes of the data are understood and documented. Any missing, duplicate, or
corrupt (e.g., unable to open) files are discovered.
File Renaming: To rename files in a dataset, often to standardize and/or reflect important metadata.
Indexing: Verify all metadata provided by the author and crosswalk to descriptive and administrative
metadata compliant with a standard format for repository interoperability.
Interoperability: Formatting the data using a disciplinary standard for better integration with other
datasets and/or systems.
Peer-review: The review of a data set by an expert with similar credentials and subject knowledge as
the data creator for the purposes of validating the soundness and trustworthiness of the file contents.
Persistent Identifier: A URL (or Uniform Resource Locator) that is monitored by an authority
to ensure a stable web location for consistent citation and long-term discoverability. Provides
redirection when necessary (e.g., a Digital Object Identifier or DOI).
Quality Assurance: Ensure that all documentation and metadata are comprehensive and complete.
Example actions might include: open and run the data files; inspect the contents in order to validate,
clean, and/or enhance data for future use; look for missing documentation about codes used, the
significance of “null” and “blank” values, or unclear acronyms.
Restructure: Organize and/or reformate poorly structured data files to clarify their meaning
and importance.
Software Registry: Maintain copies of modern and obsolete versions of software (and any relevant
code libraries) so that data may be opened/used overtime.
Transcoding: With audio and video files, detect technical metadata (min resolution, audio/video
codec) and encode files in ways that optimize reuse and long-term preservation actions (e.g., Convert
QuickTime files to MPEG4).
39. Please indicate the importance of these data curation processing and reveiw activities on a scale
of 1 to 5 where 1=essential; 2=very important; 3=moderately important; 4=less important; 5=not
important. N=22
Activity 1 2 3 4 5 Rating Average
Persistent Identifier 18 3 1 0 0 1.23
File Inventory or Manifest 10 5 3 2 2 2.14
Indexing 6 8 5 2 1 2.27
Quality Assurance 3 9 6 2 2 2.59
Transcoding 4 9 3 4 2 2.59
Software Registry 2 8 5 5 2 2.86
