Laura Mandell has placed on update on 18thConnect that indicates that an agreement has been reached for 18thConnect to work with Gale. There’s a recorded link to her ALA talk that is not opening for me as well as the following news about a grant the project has received from ICHASS:
18thConnect: From PDF Images to Clean Data Sets, led by the University of Illinois’ Robert Markley, will use supercomputer time to run a parallelized optical character recognition (OCR) program on pages of images of 18th century printed texts, made available through its collaboration with Gale Group. The resulting archive of machine-readable 18th-century texts in history, literature, art, the sciences, and the emerging social sciences will be accessible to scholars for faceted searching, automated semantic tagging, hand encoding of digital scholarly editions, and data mining. By converting a vast archive of images into machine-readable texts, this project will provide a model for adapting OCR programs to field-specific problems that must be solved in order to preserve the full range of our cultural heritage.
I am hoping that Laura and Bob may be able to tell us more.