Anna Battigelli and Eleanor Shevlin invited me to write a bit about the Eighteenth-Century Book Tracker project that Laura Mandell linked to last week, and I’m happy to do so.
This is a project I began thinking about around a year ago, and to explain some of its premises, I’d best say a bit about the circumstances that gave rise to it. I teach at a mid-sized, primarily undergraduate public university that hasn’t purchased access to ECCO, EEBO, et. al. and, realistically speaking, isn’t ever going to purchase access to them at their current prices. I’m really fortunate to be able to use ECCO and other resources at the University of Connecticut, just a few miles up the road, so my own research isn’t unduly hampered by not having them at my home institution. (What hampers my research is my 4/4 teaching load, but that’s another matter…) I can’t really take advantage of ECCO in my teaching, though, which led me to start exploring resources like Google Books and the Internet Archive. While you can’t beat the price, those sites—and, let’s recall, they’re functionally the only ones that people without institutional access to the big databases can leverage—leave a lot to be desired.
There’s been a lot of good discussion here about the nature of Google Books and the Internet Archive—what they are and aren’t good for, how best to think about them, whether as catalogues/finding aids or as searchable textbases. I hope it won’t seem too contrary of me, then, to say that, at present, they aren’t especially good at being either of those things.
Given the problems with OCR of eighteenth-century print that Steve Karian recently reminded us of, I don’t think we can be too confident in using these cost-free sites as the foundation for the kinds of digital humanities approaches that Matthew Wilkens describes so well (what John Unsworth calls “not reading” and Tanya Clement calls “distant reading,” both in reference to Project MONK).
In a recent videocast, Laura Mandell notes—rightly, I think—that, at present, we don’t really have usable “data” for this sort of work in eighteenth-century studies because the plain text that we have is so poor (this part of the discussion appears at about 0:35). If this is a weakness of ECCO, it is an even greater one for Google and the Internet Archive, which don’t appear to have taken any special steps to compensate for the quirks of eighteenth-century print—ECCO at least offers different levels of “fuzzy search.” (I should note, though, that Google appears to be doing something on this front: I recently saw their site correctly interpret a long-s. Only one, mind you, but it’s something.) For the time being, then, Google Books and the Internet Archive—the two largest sources of freely-available facsimile texts I know of—seem like dubious textbases for digital humanities.
That being the case, it’s the more disappointing that they fall down so badly on the bibliographical finding aid front: if we can’t yet confidently use Google Books or the Internet Archive for “not reading” the eighteenth century, it would at least be nice to be able to use them to read particular eighteenth-century books. As others have already noted, though, the sites’ bibliographical shortcomings make that harder than you’d think (the Internet Archive seems to me to be a bit better than Google Books on this point, though that’s not saying a great deal).
It’s this question of bibliographical accuracy that Eighteenth-Century Book Tracker is trying to take on. The problem of “How Not to Read a Million Books,” as John Unsworth and his collaborators put it, is an important and fascinating one, and is already leading to some really interesting scholarship. There’s still a very large contingent of students and scholars, though, who don’t have a million books from which to choose, say, ten to read (much less to “not read”). Google Books, the Internet Archive, and any number of other, smaller, ad hoc digitization projects hold the promise of supplying that deficiency, if only one could find what one was looking for.
So, what Eighteenth-Century Book Tracker tries to do is to provide a clearinghouse for registering links to freely-available digital facsimiles of eighteenth-century texts, pooling its users’ discoveries and attaching them to bibliographically responsible entries. That way, someone who’s looking, for instance, for a copy of The Dunciad can know precisely what they’re looking at. (I was nonplussed, incidentally, by Paul Duguid’s account of being taken to task for “bibliographical fastidiousness” in the piece by Peter Brantley that David Mazella linked to recently. Is that a bad thing? Can one afford to be anything other than bibliographically fastidious when approaching a text like The Dunciad?) Someone who wants to see Henry Fielding’s Amelia can find all of the volumes in one place (with fair warning that those volumes are drawn from two different copies). And someone who wants to see Eliza Haywood’s The History of Miss Betsy Thoughtless can save themselves the aggravation of trying to find all the volumes of the 1751 edition (since they’re not there), while also seeing that the 1768 fourth edition is available in its entirety—that they’re not the same thing is precisely the point.
In a sense, this is primarily a bibliographical reclamation project, trying to reconstruct information we already had about these books before they were digitized, but that’s been lost in the translation into the digital medium. (That it’s even necessary to do so is unfortunate, but there we are.) I would cautiously submit, though, that there are some areas where the site can actually lead to new advances.
- First, there’s the fact that Eighteenth-Century Book Tracker is set up to accept links to texts in languages other than English (though I haven’t gathered very many yet). I’m not aware of resources comparable in size and scope to ECCO for other languages, so it’s my hope that people whose work is in languages other than English can take advantage of it as infrastructure for building up a library of facsimiles of texts in, say, French, Dutch, or Spanish. (The site’s set up to accept entries in seven modern European languages, as well as Greek and Latin, and I’m open to expanding that number, if there are users to do it.)
- Though digitization projects like Google’s (and like Microsoft’s—now defunct, but turned over to the Internet Archive) don’t seem to have been especially systematic in deciding what to scan, one still runs across some interesting finds. Google’s work at Oxford, to take only one instance, has drawn from books at the English Faculty Library and the Taylor Institution, neither of which collections seems to have been matched with records in the ESTC (this is to say nothing of the European libraries with which Google has partnerships). As of now, the site has records on around 75 copies that aren’t recorded by the ESTC, but which certainly could be. (Then, too, there are copies one finds that are listed as “unverified” in the ESTC. Some of those could be checked off this way, as well.)
- Having such convenient access to entire volumes turns up some anomalies that often aren’t evident from library catalogues. A copy of Conyers Middleton’s The Life of Marcus Tullius Cicero listed in the University of Michigan’s MIRLYN catalogue as the eighth edition (a copy, incidentally, not recorded by the ESTC) is a case in point. The title page of Volume III announces that that volume is from “The Sixth Edition” of 1761, which is not an edition noted in the ESTC. Now, I don’t know (and, at the moment, frankly don’t care to know) enough about Middleton’s bibliography to say whether this is a hitherto unrecorded edition or whether that’s just how Volume III of the eighth edition appeared. But in either case, it’s the sort of thing that might be worthy of a note in the ESTC. (Something similar happens in the case of a 1713 edition of Addison’s Cato, apparently scanned from a copy at Oxford’s English Faculty Library. The text proves to have been bound with four responses to Addison’s play, none of which are recorded by Oxford’s OLIS catalogue. These copies are, we could say, doubly lost to the ESTC: once, because no matching seems to have been done with that library’s collection, and twice, because a simple matching of catalogue records wouldn’t reveal the presence of the texts.)
- Again, because Google doesn’t appear to be making any effort to avoid duplication of editions scanned at different partner libraries, one is sometimes able to compare multiple copies of a work. In at least one case, the tenth volume of the 1745 edition of Swift’s Miscellanies, one can see that the copies are from two different printings, something I don’t believe is noted in the Teerink-Scouten bibliography (though, given the complexity of the entries for the various editions of the Miscellanies, I may simply be overlooking something.) [UPDATE: Steve Karian informs me, via email, that this variant isn’t, in fact, noted in Teerink-Scouten, but is described in Leland D. Peterson, “A variant of the 1742-46 Swift-Pope Miscellanies,” Papers of the Bibliographical Society of America 66 (1972): 302-10.]
Thus, while Eighteenth-Century Book Tracker is meant primarily as a finding aid, and not as a bibliography, per se, it may still be able to yield some collateral bibliographical gains.
There are just two other things I’d like to mention briefly about the site. The first is that I think that one potential use for the site is as a teaching tool. I’ve had some success in getting undergraduates to identify eighteenth-century editions at Google Books using the ESTC. In a sense, precisely because Google Books is such a bibliographical mess, it provides a good occasion for directing students’ attention to matters of bibliography that I would never have attempted to explore with them otherwise. With the right coaching, undergraduates can actually do quite a lot in this line—in the course of two semesters, my students have brought me a little over 200 links (not all of which I’ve had time to enter into the database yet). So I think there are real opportunities for involving students in the creation of a valuable scholarly resource, even as we teach them about questions of print culture and book history that I don’t believe undergraduates are often exposed to.
The last thing I’ll say is that I think this bibliographical reclamation task, as I’ve called it, can play a role in the eventual development of more ambitious digital humanities projects. Right now, Eighteenth-Century Book Tracker is just trying to provide the missing finding aid that’s needed to read digital surrogates of eighteenth-century books that are available at sites like Google Books and the Internet Archive. But finding the books is a necessary first step for any subsequent work—you can’t “not read” the books effectively if you don’t know where they are. The recent announcement that 18thConnect will attempt to re-index page images provided by Gale using a bespoke, optimized OCR routine is, I think, a very positive development. Even those who don’t have institutional access to ECCO will, it appears, soon be able to search the texts using what promises to be a much cleaner textbase than is currently available. Once you know that a word or phrase appears on a particular page in a particular book, do you really care whether the page image comes from Gale or from Google? To take advantage of exciting developments like this one, though, and to ensure that the benefits of those developments extend to students and scholars who don’t have access to tools like ECCO, we have to have a way to find the books and know what we’ve found. That’s where I hope Eighteenth-Century Book Tracker will fit in.
I’ve gone on much longer than I intended, so I’ll simply close by saying that I welcome feedback about the site. It’s still a work in progress, and I would very much appreciate hearing thoughts about how the site could be developed so as to be most useful.