through Flickr. Oral Roberts University’s Written Rummage project2 to transcribe
a Fredrick Douglass diary uses a free Amazon cloud service, Mechanical Turk, to
manage the transcription workflow.3 Other more well-heeled projects have
resulted in efforts such as What’s on the Menu?,4 a New York Public Library
project to transcribe historic restaurant menus. As of the end of November 2011,
there have been 645,517 dishes transcribed from 10,960 menus.
Some projects rely on specialized crowdsourcing software. Among the first
to enter this arena was software engineer Ben Brumfield, who built the web-
based tool From the Page5 for transcribing, indexing, and annotating
handwritten material. At the time Iowa was starting its project, this was the
only open-source solution around. Since then, with the help of grants from the
National Endowment for the Humanities Office of Digital Humanities, the Roy
Rosenzweig Center for History and New Media has developed an open-source
tool Scripto and applied it to transcribe 45,000 papers of the War Department.6
This solution is gaining momentum, in part because it integrates with existing
content management systems.
Libraries considering crowdsourcing should also look to the Australian and
European library communities, as well as non-library efforts, for innovative
and more seasoned examples of engaging the crowd. The National Library of
Australia and the non-profit Distributed Proofreaders have organized extensive
projects to correct text images scanned using OCR and enhance access by adding
tags and other markup.7 International university collaborations such as Galaxy
Zoo, a Zooniverse Project, ask volunteers to classify millions of photographs of
galaxies, while still other projects invite the public to upload their own artifacts
and recollections for inclusion in an online collection.
The Iowa Approach
In preparation for the Civil War sesquicentennial beginning in 2011, the UI
Libraries conducted a two-year reformatting project to provide comprehensive
digital access to the Civil War manuscript materials in its Special Collections
department, comprising approximately 50 collections containing more than
20,000 pages of correspondence and diary pages. As the scanning effort was
drawing to a close, curators began to discuss ways to promote the resulting
digital collection. Most of the items were handwritten and lacking transcriptions
(with the exception of a small number provided by the families who donated the
materials), so the idea of a transcription crowdsourcing project had strong
RLI 277
10
Experimenting with Strategies for Crowdsourcing Manuscript Transcription
(
C O N T I N U E D
)
DECEMBER 2011 RESEARCH LIBRARY ISSUES: A QUARTERLY REPORT FROM ARL, CNI, AND SPARC
Previous Page Next Page