43 SPEC Kit 354: Data Curation
We use an unmediated ingest process; however, our data sources are mandated to follow university
privacy policies distinguishing between restricted data (uses ResearchVault for secure storage) and
sensitive data (uses Gatorbox for encrypted storage).
Here are descriptions of eight data curation processing and review activities.
Arrangement and Description: The re-organization of files (e.g., new folder directory structure) in
a dataset that may also involve the creation of new file names, file descriptions, and the recording of
technical metadata inherent to the files (e.g., date last modified).
Code Review: Run and validate computer code (e.g., look for missing files and/or errors) in order to
find mistakes overlooked in the initial development phase, improving the overall quality of software.
Contextualize: Use metadata to link the data set to related publications, dissertations, and/or projects
that provide added context to how the data were generated and why.
Conversion (Analog): In effort to increase the usability of a data set, the information is transferred
into digital file formats (e.g., analog data keyed into a database). Note: digital conversion is also used
to convert “fixed” data (e.g., PDF formats) into machine-readable formats. 
Curation Log: A written record of any changes made to the data during the curation process and by
whom. File is often preserved as part of the overall record.
Data Cleaning: A process used to improve data quality by detecting and correcting (or removing)
defects & errors in data.
Deidentification: Redacting or removing personally identifiable or protected information (e.g.,
sensitive geographic locations) from a dataset prior to sharing with third parties.
File Format Transformations: Transform files into open, non-proprietary file formats that broaden
the potential for long-term reuse and ensure that additional preservation actions might be taken in
the future. Note: Retention of the original file formats may be necessary if data transfer is not perfect.
30. Please indicate your institution’s level of support for these data curation processing and review
activities on a scale of 1 to 5 where 1=currently providing; 2=will provide in the near future;
3=would like to provide, but unable to at this time; 4=no interest/desire to provide; 5=unsure. N=49
Activity 1 2 3 4 5
Contextualize 28 4 11 4 2
Arrangement and Description 27 3 11 5 3
File Format Transformations 25 5 11 2 5
Curation Log 16 4 20 3 3
Data Cleaning 15 3 18 7 6
Conversion (Analog) 13 4 16 11 4
Deidentification 8 2 23 11 5
Code review 4 1 28 10 6
# of respondents 38 10 41 22 10
Previous Page Next Page