rather than dissipated, other quality-control and metadata problems that are
difficult to solve algorithmically and do require continuing special treatment.
JSTOR has maintained the gold standard for descriptive metadata in its serials
collections.13 However, Geoffrey Nunberg has recently pointed to a variety of
general errors in Google’s book collection that are particularly troublesome for
scholars who depend in their work on careful description of ordinary features
such as series, edition, volume, and publication date.14 In addition, the Council
on Library and Information Resources will soon be releasing reports of extensive
Mellon-funded studies by scholars in four different fields— linguistics; Latin
American literature; history; and media history and cultural studies—that
document vexing and ongoing quality-control problems in the book collections
digitized by both Google and the Open Content Alliance.15 Mellon also made
a grant this summer to the University of Michigan for a systematic
characterization of quality-control issues in the HathiTrust collections.
A third trap in the logic about the common and special collections lies in the
largely unexplored area of what the Proposed Settlement Agreement for the
Google book digitization project has called “non-consumptive research.”16
Joseph Esposito, Clifford Lynch,and others have often pointed out that the
bulk of reading in the future will not be done by humans but by computers.17
Non-consumptive research refers to such a kind of reading. Overall, our
experience with non-consumptive research on texts is limited, especially in fields
of the humanities outside of linguistics, but we have learned a good deal from
the NORA, MONK, and SEASR projects at the University of Illinois at Urbana-
Champaign. Teams of scholars led by John Unsworth, Martin Mueller, and
others have found that computers are powerful readers when working on
simple discovery tasks, but for advanced scholarly analysis, the machines are
largely illiterate unless they are working on well-prepared and well-marked-up
texts.18 Different kinds of inquiries require different kinds of markup, often
overlapping, and only some of the markup can be accomplished by algorithm
given current technologies. Moreover, texts created by optical character
recognition often need even further correction and preparation for sophisticated
reading by machine. I assume that these various kinds of human intervention
would be permitted on the texts stored in the non-consumptive research centers
that the Google Settlement would establish. If not, much useful work could be
done on public-domain materials even though the utility would be limited to
special scholarly audiences in specific disciplines. In any case, the special
RLI 267
34
The Changing Role of Special Collections in Scholarly Communications
(
C O N T I N U E D
)
DECEMBER 2009 RESEARCH LIBRARY ISSUES: A BIMONTHLY REPORT FROM ARL, CNI, AND SPARC
Previous Page Next Page