51 SPEC Kit 354: Data Curation
Documentation: Information describing any necessary information to use and understand the
data. Documentation may be structured (e.g., a code book) or unstructured (e.g., a plain text
“Readme” file).
File Validation: A computational process to ensure that the intended data transfer to a repository
was perfect and complete using means such as generating and validating file checksums (e.g., test
if a digital file has changed at the bit level) and format validation to ensure that file types match
their extensions.
Metadata: Information about a data set that is structured (often in machine-readable format) for
purposes of search and retrieval. Metadata elements may include basic information (e.g. title, author,
date created, etc.) and/or specific elements inherent to datasets (e.g., spatial coverage, time periods).
36. Please indicate the importance of these data curation ingest activities on a scale of 1 to 5 where
1=essential; 2=very important; 3=moderately important; 4=less important; 5=not important. N=24
Activity 1 2 3 4 5 Rating Average
Metadata 18 6 0 0 0 1.25
Deposit agreement 14 8 2 0 0 1.50
Documentation 17 3 3 1 0 1.50
File validation 8 9 5 1 1 2.08
Authentication 10 5 5 3 1 2.17
Chain of custody 6 9 4 3 2 2.42
# of respondents 22 20 11 6 2
Note: A lower average rating indicates a more important activity.
Comments N=5
All of these responses reflect importance in an ideal world (in which we had unlimited funds,
personnel, time, etc.) In no way could our institution actually do this at scale.
All six of these items are important, but from an institutional priorities point of view (and of course
limited time and resources), we have ranked these with current practice.
For a robust system with preservation as a mandate then these are all essential.
It isn’t possible for us to comment on the importance of these activities. We have not had in-depth
conversations about data curation at the library.
We tend to assume that the researcher/group will provide the best documentation and that our forte is
metadata for discoverability. Deposit agreements are already part of the ingest workflow for articles, so
would probably remain for data sets.
Here are descriptions of three data curation appraisal activities.
Rights Management: The process of tracking and managing ownership and copyright inherent to
a data set as well as monitoring conditions and policies for access and reuse (e.g., licenses and data
use agreements).
Risk Management: The process of reviewing data for known risks such as confidentiality issues
inherent to human subjects data, sensitive information (e.g., sexual histories, credit card information)
Previous Page Next Page