54 Survey Results: Survey Questions and Responses
The need to convert data from analog to digital formats will depend on the assessed value of the
particular data set in question. Data cleaning and deidentiﬁcation are very important curation activities
but should be performed by the data owners rather than the library curation staﬀ.
We believe all this is important, just not things the LIBRARY needs to do or should do.
IMPORTANCE OF PROCESSING AND REVIEW ACTIVITIES PART 2
Here are descriptions of ten more data curation processing and review activities.
File Inventory or Manifest: The data ﬁles are inspected periodically and the number, ﬁle types
(extensions), and ﬁle sizes of the data are understood and documented. Any missing, duplicate, or
corrupt (e.g., unable to open) ﬁles are discovered.
File Renaming: To rename ﬁles in a dataset, often to standardize and/or reflect important metadata.
Indexing: Verify all metadata provided by the author and crosswalk to descriptive and administrative
metadata compliant with a standard format for repository interoperability.
Interoperability: Formatting the data using a disciplinary standard for better integration with other
datasets and/or systems.
Peer-review: The review of a data set by an expert with similar credentials and subject knowledge as
the data creator for the purposes of validating the soundness and trustworthiness of the ﬁle contents.
Persistent Identiﬁer: A URL (or Uniform Resource Locator) that is monitored by an authority
to ensure a stable web location for consistent citation and long-term discoverability. Provides
redirection when necessary (e.g., a Digital Object Identiﬁer or DOI).
Quality Assurance: Ensure that all documentation and metadata are comprehensive and complete.
Example actions might include: open and run the data ﬁles; inspect the contents in order to validate,
clean, and/or enhance data for future use; look for missing documentation about codes used, the
signiﬁcance of “null” and “blank” values, or unclear acronyms.
Restructure: Organize and/or reformate poorly structured data ﬁles to clarify their meaning
Software Registry: Maintain copies of modern and obsolete versions of software (and any relevant
code libraries) so that data may be opened/used overtime.
Transcoding: With audio and video ﬁles, detect technical metadata (min resolution, audio/video
codec) and encode ﬁles in ways that optimize reuse and long-term preservation actions (e.g., Convert
QuickTime ﬁles to MPEG4).
39. Please indicate the importance of these data curation processing and reveiw activities on a scale
of 1 to 5 where 1=essential; 2=very important; 3=moderately important; 4=less important; 5=not
Activity 1 2 3 4 5 Rating Average
Persistent Identiﬁer 18 3 1 0 0 1.23
File Inventory or Manifest 10 5 3 2 2 2.14
Indexing 6 8 5 2 1 2.27
Quality Assurance 3 9 6 2 2 2.59
Transcoding 4 9 3 4 2 2.59
Software Registry 2 8 5 5 2 2.86