45 SPEC Kit 354: Data Curation
Interoperability: Formatting the data using a disciplinary standard for better integration with other
datasets and/or systems.
Peer-review: The review of a data set by an expert with similar credentials and subject knowledge as
the data creator for the purposes of validating the soundness and trustworthiness of the ﬁle contents.
Persistent Identiﬁer: A URL (or Uniform Resource Locator) that is monitored by an authority
to ensure a stable web location for consistent citation and long-term discoverability. Provides
redirection when necessary (e.g., a Digital Object Identiﬁer or DOI).
Quality Assurance: Ensure that all documentation and metadata are comprehensive and complete.
Example actions might include: open and run the data ﬁles; inspect the contents in order to validate,
clean, and/or enhance data for future use; look for missing documentation about codes used, the
signiﬁcance of “null” and “blank” values, or unclear acronyms.
Restructure: Organize and/or reformate poorly structured data ﬁles to clarify their meaning
Software Registry: Maintain copies of modern and obsolete versions of software (and any relevant
code libraries) so that data may be opened/used overtime.
Transcoding: With audio and video ﬁles, detect technical metadata (min resolution, audio/video
codec) and encode ﬁles in ways that optimize reuse and long-term preservation actions (e.g., Convert
QuickTime ﬁles to MPEG4).
31. Please indicate your institution’s level of support for these data curation processing and review
activities on a scale of 1 to 5 where 1=currently providing; 2=will provide in the near future;
3=would like to provide, but unable to at this time; 4=no interest/desire to provide; 5=unsure. N=48
Activity 1 2 3 4 5
Persistent Identiﬁer 40 2 5 0 1
Indexing 25 2 16 3 2
File renaming 22 2 14 9 1
Quality Assurance 22 1 16 6 3
File Inventory or Manifest 21 2 19 4 2
Restructure 17 2 15 11 3
Transcoding 13 2 20 8 5
Interoperability 11 3 25 5 4
Software Registry 4 2 21 12 9
Peer-review 1 0 22 20 5
# of respondents 42 9 40 25 15
For some of these activities, we already support some, but not all, aspects described herein (e.g., we
verify metadata but don’t crosswalk, we ensure documentation are comprehensive and complete,
but we don’t open and run data ﬁles). We have not yet received AV materials as part of our data
For those marked 1, we do a pretty minimal amount, e.g., might do ﬁle renaming or restructuring, or
metadata for a group or set of ﬁles, but not for each individual ﬁle.
Most of the above is for libraries collections.