130 · Representative Documents: Data Management Plan Tools
JOHNS HOPKINS UNIVERSITY
Questionnaire to Help with the Creation of a Data Management Plan
http://dmp.data.jhu.edu/sites/default/files/Questionnaire.doc
Additional Tips and Instructions (See corresponding endnote number in the text)
1 Data source can include instruments, people, and data centers. Data product examples: transcripts, tables,
3D models, digital audio, geospatial data. Format examples: RTF text, MS Excel converted to CSV, MATLAB,
WAV audio, shapefile. (Specify any instrument-specific formats or software packages). Estimated amount can
include rate produced, e.g. 1 TB/year, 50GB/experiment. Include any sources and data products created by others
that you are using. It may help to think through the steps of your research workflow to identify data types and
sources requiring management.
2 Metadata is the information that captures the who, what, when, where, why and how of your data,
providing the details necessary for another researcher to use your data sets. Some scientific communities have
established metadata standards, such as Content Standard for Digital Geospatial Metadata (CSDGM), Data
Documentation Initiative (DDI), Climate and Forecast (CF) metadata convention, and Dublin Core. Metadata may
take the form of “readme files” that explain variables and file structures however, it is preferable if metadata files
are machine readable for better re-usability and processing.
3 Storing data is defined differently than archiving data. Storage is a necessary step towards archiving your
data however, storing data (e.g., on an external drive) does not safeguard against media degradation (e.g., CD file
corruption), obsolescence of data formats (e.g., VisiCalc spreadsheets) or providing easy access in the future.
Archiving encompasses both active preservation of the digital object and increased discoverability and access to
those data. Your plan should discuss how you will store your research data during the project and your preservation
strategy for after the project, particularly of research data that will be reused and shared. The next two sections help
frame these different topics.
4 JHU requires retention of research data for a minimum period of 5 years after the date of any publication
upon which it is based (http://jhuresearch.jhu.edu/Data_Management_Policy.pdf). The NSF Engineering Directorate
requires retention for 3 years after conclusion of the award or 3 years after public release, whichever is later.
5 Different data archives provide different kinds of services, such as the creation of persistent, unique
identifiers for citation, format migration, disaster recovery plans, and free, publicly-accessible downloading of data
files. If you plan to use a data repository, we strongly recommend that you contact the repository to ensure that their
archive can handle your data, and determine their archiving fees to include in your budget. Johns Hopkins
University has built a research data archive. Please contact datamanagement@jhu.edu to learn more about it.
6 Briefly address the following questions for each data product in Table 1. (You might refer to each by
number).
7 NSF expects data sharing to follow the norms of your research community, but encourages efforts to
broaden the range of data shared and of potential users beyond your field. Data can often be of unanticipated interest
in the future if it can be located, understood, and cited.
8 “Accessible” generally means unmediated public access to your data distributed through a “cyber
resource,” unless you specify conditions, such as embargo periods. “Sharing” can include direct release to interested
parties upon request.
9 Specify a time period, e.g., “Data will be made available for sharing, in principle, two years after
acquisition.”
10 This section will detail any reasons for sharing delays (e.g., embargo, publisher, patent, or political reasons)
or restrictions (e.g. ecological endangerment concerns, IRB restrictions of sensitive data). You should also address
granular methods for control and access (e.g., maintaining formal consent agreements, anonymous data, and
restricted access to secured networks.)
11 State if there are IRB restrictions on data and steps to prepare accessible datasets, such as deidentifying
transcripts. NSF requires fewer details than IRB forms, and respects when IRB restrictions put sharing beyond a
reasonable effort, but they do sometimes ask for some attempt to create sharable datasets.
Previous Page Next Page