English Professor Laura Mandell, Director of the Initiative for Digital Humanities, Media, and Culture (IDHMC), along with two co-PIs Professor Ricardo Gutierrez-Osuna and Professor Richard Furuta, are very pleased to announce that Texas A&M has received a 2-year, $734,000 development grant from the Andrew W. Mellon Foundation for the Early Modern OCR Project (eMOP, http://emop.tamu.edu ). The two other project leaders, Anton DuPlessis and Todd Samuelson, are book historians from Cushing Rare Books Library.
Over the next two years, eMOP will work to improve scholarly access to an extensive early modern text corpus. The overarching goal of eMOP is to develop new methods and tools to improve the digitization, transcription, and preservation of early modern texts.
The peculiarities of early printing technology make it difficult for Optical Character Recognition (OCR) software to discern discrete characters and, thus, to render readable digital output. By creating a database of early modern fonts, training the software that mechanically types page images (OCR) to read those typefaces, and creating crowd-sourced correction tools, eMOP promises to improve the quality of digital surrogates for early modern texts. Receiving this grant makes possible improving the machine-translation of digital page images with cutting-edge crowd-sourcing and OCR technologies, both guided by book history. Our goal is to further the digital preservation processes currently taking place in institutions, libraries, and museums globally.
The IDHMC, along with our participating institutions and individuals, will aggregate and re-tool many of the recent innovations in OCR in order to provide a stable community and expanded canon for future scholarly pursuits. Thanks to the efforts of the Advanced Research Consortium (ARC) and its digital hubs, NINES, 18thConnect, ModNets, REKn and MESA, eMOP has received permissions to work with over 300,000 documents from Early English Books Online (EBBO) and Eighteenth-Century Collections Online (ECCO), totaling 45 million page images of documents published before 1800.
The IDHMC is committed to the improvement and growth of digital projects and resources, and the Mellon Foundation’s grant to Texas A&M for the support of eMOP will enable us to fulfill our promise to the scholarly community to educate, preserve, and develop the future of humanities scholarship.
For further information, including webcasts describing the problem and the grant application as submitted, please see the eMOP website: http://emop.tamu.edu
For more information on our project partners, please see the following links.
ECCO at Gale-Cengage Learning
EBBO at ProQuest
Professor Raghavan Manmatha at the University of Massachusetts Amherst
The IMPACT project at the Koninklijke Bibliotheek – National Library of the Netherlands
PRImA at the University of Salford Manchester
Department of Computer Science and Engineering, Texas A&M University
The Initiative for Digital Humanities, Media, and Culture, Texas A&M University
Cushing Memorial Library and Archives
The OCR Summit Meeting Participants