16 · Survey Results: Executive Summary
grants (24%), five charge researchers (14%), and seven
have found funds through other means (19%). No li-
brary charges users for data access.
There are clear differences between IRs with data
and data-specific archives. Thirty of the IRs (94%)
absorb any extra costs for research data into the re-
pository budget. Only one of the data archives re-
ported funding from their general budget. Two of
the data archives are grant funded, and three charge
researchers for archiving. Charging researchers or
fees upon grants are much less common for IRs with
data. Understanding the costs of archiving in many
cases is still under review, and institutions recognize
that archiving costs will need to factor in the volume
of data and length of hold.
Data Archive Infrastructure
The survey asked respondents to describe the plat-
form and software used for their data archiving so-
lution (Q22). Most of the 38 respondents use open
source software for all or part of their solutions one
developed their own software. DSpace is the most
commonly used institutional repository and digital
collection platform and interface (17, or 43%). Fedora
is the platform for eight of the IRs (20%), often along
with additional software interfaces such as Hydra or
iRODS. The five data archives use Fedora and Data
Conservancy software, Chronopolis, a customization
of HubZero, a multi-component system that includes
Fedora, Archivematica, Dataverse,6 and iRODS, and a
custom-built repository. All are in active development
and/or in “beta” phase of implementation.
To assess the use of repositories for data, the sur-
vey asked for estimates of the number of researchers
currently depositing datasets in the archives (Q23),
the typical sources of archived data (Q25), and total
deposit size (Q26). Twenty-eight of those with IRs re-
ported that zero to 1000+ researchers have deposited
data (a median of 10 and an average of about 91 re-
searchers). Four of those with data archives reported
that between two and 100 researchers have deposited
data. Twenty-two of the respondents (66%) reported
that data deposits are in the gigabyte range all but
three are under 100GB. Eleven others reported to-
tal deposits between 1 and 75 terabytes. Follow-up
with respondents might yield more precise numbers
and distinguish among archives with single large
and many small deposits. Clearly, however, these are
early days for both data-specific archives and IRs with
data, and possibly also for researcher’s awareness and
adoption of these archiving options.
Data files in both IRs and data archives are com-
ing from a range of sources. Most of the respondents
report that datasets are associated with particular
publications (88%), full research projects (85%), and
graduate theses/dissertations (80%). Twenty-five (63%)
report that data was moved from another archive to
the library. As data-specific archives expand in use,
there may be shifts in data sources that institutional
repositories cannot accommodate as well.
The survey also asked about data deposit options.
Institutional repositories are generally set up for self-
deposit by researchers, and 23 of the IRs with data
(65%) do allow data deposits without direct assistance
(Q28). However, all but one of these also provides
assistance, and 19 say they will deposit data collec-
tions for their researchers. Three data archives allow
researchers to self-deposit, and they also provide as-
sistance and will deposit data for the researcher. A
trend to follow is whether data archive software and
support models become more “self-service” for re-
searchers or remain a staff-mediated service.
The final set of data archiving questions addressed
details on their architecture for access and preserva-
tion. The survey responses show that open access is
the policy and intention for all but three of the librar-
ies with archiving solutions, as one would assume
based on the literature and public funder require-
ments. Six of the IRs and data archives also allow
controlled access, such as administrative or researcher
approval to access data. For data archives in particu-
lar, the type of access may be a technical issue, not just
policy. Datasets for two institutions are essentially
“dark archives” for preservation without a public
interface as a direct component of the system, and at
least one archive does not currently have the capacity
to control access.
Another feature generally considered essential to
data archives is support for persistent identifiers so
that datasets can be located long-term and reliably
cited in publications (including, in some cases, citing
particular versions of collections updated with new
Previous Page Next Page