43 SPEC Kit 354: Data Curation
We use an unmediated ingest process; however, our data sources are mandated to follow university
privacy policies distinguishing between restricted data (uses ResearchVault for secure storage) and
sensitive data (uses Gatorbox for encrypted storage).
SUPPORT FOR PROCESSING AND REVIEW ACTIVITIES PART 1
Here are descriptions of eight data curation processing and review activities.
Arrangement and Description: The re-organization of ﬁles (e.g., new folder directory structure) in
a dataset that may also involve the creation of new ﬁle names, ﬁle descriptions, and the recording of
technical metadata inherent to the ﬁles (e.g., date last modiﬁed).
Code Review: Run and validate computer code (e.g., look for missing ﬁles and/or errors) in order to
ﬁnd mistakes overlooked in the initial development phase, improving the overall quality of software.
Contextualize: Use metadata to link the data set to related publications, dissertations, and/or projects
that provide added context to how the data were generated and why.
Conversion (Analog): In eﬀort to increase the usability of a data set, the information is transferred
into digital ﬁle formats (e.g., analog data keyed into a database). Note: digital conversion is also used
to convert “ﬁxed” data (e.g., PDF formats) into machine-readable formats.
Curation Log: A written record of any changes made to the data during the curation process and by
whom. File is often preserved as part of the overall record.
Data Cleaning: A process used to improve data quality by detecting and correcting (or removing)
defects & errors in data.
Deidentiﬁcation: Redacting or removing personally identiﬁable or protected information (e.g.,
sensitive geographic locations) from a dataset prior to sharing with third parties.
File Format Transformations: Transform ﬁles into open, non-proprietary ﬁle formats that broaden
the potential for long-term reuse and ensure that additional preservation actions might be taken in
the future. Note: Retention of the original ﬁle formats may be necessary if data transfer is not perfect.
30. Please indicate your institution’s level of support for these data curation processing and review
activities on a scale of 1 to 5 where 1=currently providing; 2=will provide in the near future;
3=would like to provide, but unable to at this time; 4=no interest/desire to provide; 5=unsure. N=49
Activity 1 2 3 4 5
Contextualize 28 4 11 4 2
Arrangement and Description 27 3 11 5 3
File Format Transformations 25 5 11 2 5
Curation Log 16 4 20 3 3
Data Cleaning 15 3 18 7 6
Conversion (Analog) 13 4 16 11 4
Deidentiﬁcation 8 2 23 11 5
Code review 4 1 28 10 6
# of respondents 38 10 41 22 10