rather than dissipated, other quality-control and metadata problems that are difficult to solve algorithmically and do require continuing special treatment. JSTOR has maintained the gold standard for descriptive metadata in its serials collections.13 However, Geoffrey Nunberg has recently pointed to a variety of general errors in Google’s book collection that are particularly troublesome for scholars who depend in their work on careful description of ordinary features such as series, edition, volume, and publication date.14 In addition, the Council on Library and Information Resources will soon be releasing reports of extensive Mellon-funded studies by scholars in four different fields— linguistics Latin American literature history and media history and cultural studies—that document vexing and ongoing quality-control problems in the book collections digitized by both Google and the Open Content Alliance.15 Mellon also made a grant this summer to the University of Michigan for a systematic characterization of quality-control issues in the HathiTrust collections. A third trap in the logic about the common and special collections lies in the largely unexplored area of what the Proposed Settlement Agreement for the Google book digitization project has called “non-consumptive research.”16 Joseph Esposito, Clifford Lynch,and others have often pointed out that the bulk of reading in the future will not be done by humans but by computers.17 Non-consumptive research refers to such a kind of reading. Overall, our experience with non-consumptive research on texts is limited, especially in fields of the humanities outside of linguistics, but we have learned a good deal from the NORA, MONK, and SEASR projects at the University of Illinois at Urbana- Champaign. Teams of scholars led by John Unsworth, Martin Mueller, and others have found that computers are powerful readers when working on simple discovery tasks, but for advanced scholarly analysis, the machines are largely illiterate unless they are working on well-prepared and well-marked-up texts.18 Different kinds of inquiries require different kinds of markup, often overlapping, and only some of the markup can be accomplished by algorithm given current technologies. Moreover, texts created by optical character recognition often need even further correction and preparation for sophisticated reading by machine. I assume that these various kinds of human intervention would be permitted on the texts stored in the non-consumptive research centers that the Google Settlement would establish. If not, much useful work could be done on public-domain materials even though the utility would be limited to special scholarly audiences in specific disciplines. In any case, the special RLI 267 34 The Changing Role of Special Collections in Scholarly Communications ( C O N T I N U E D ) DECEMBER 2009 RESEARCH LIBRARY ISSUES: A BIMONTHLY REPORT FROM ARL, CNI, AND SPARC
Previous Page Next Page