53 SPEC Kit 354: Data Curation
Conversion (Analog): In eﬀort to increase the usability of a data set, the information is transferred
into digital ﬁle formats (e.g., analog data keyed into a database). Note: digital conversion is also used
to convert “ﬁxed” data (e.g., PDF formats) into machine-readable formats.
Curation Log: A written record of any changes made to the data during the curation process and by
whom. File is often preserved as part of the overall record.
Data Cleaning: A process used to improve data quality by detecting and correcting (or removing)
defects & errors in data.
Deidentiﬁcation: Redacting or removing personally identiﬁable or protected information (e.g.,
sensitive geographic locations) from a dataset prior to sharing with third parties.
File Format Transformations: Transform ﬁles into open, non-proprietary ﬁle formats that broaden
the potential for long-term reuse and ensure that additional preservation actions might be taken in
the future. Note: Retention of the original ﬁle formats may be necessary if data transfer is not perfect.
38. Please indicate the importance of these data curation processing and reveiw activities on a scale
of 1 to 5 where 1=essential; 2=very important; 3=moderately important; 4=less important; 5=not
Activity 1 2 3 4 5 Rating Average
File Format Transformations 10 5 5 3 0 2.04
Contextualize 8 8 5 1 1 2.09
Curation Log 9 5 4 1 3 2.27
Deidentiﬁcation 10 3 6 1 3 2.30
Arrangement and Description 4 8 8 3 0 2.43
Code review 3 6 6 3 5 3.04
Conversion (Analog) 4 4 6 5 4 3.04
Data Cleaning 2 6 6 4 5 3.17
# of respondents 16 18 15 10 8
Note: A lower average rating indicates a more important activity.
At this time, we expect most of the data processing, cleaning, and formatting to be done prior to deposit.
Data cleaning and deidentiﬁcation are critical, but not for libraries to do.
Education related to these activities should happen well before submission as part of the data
management plan (DMP).
Items marked as not important in planning for our service are because we expect the depositor/PI to be
Some hesitation to modify data submitted by the researcher—although it may be value-added to clean
the data, there is the worry that it would fundamentally alter the data, despite best intentions. We
continue to advise and educate faculty on best practices.
Some of these actions warrant consultation or advice, but we do not think that they are responsibilities
that the curation center should take upon itself.
The 5s listed here are more an indication of where we stand on staﬃng and technological capability,
while knowing we intend to provide guidance on all these topics during the ingest process.