Eighteenth-Century Book Tracker

by

Anna Battigelli and Eleanor Shevlin invited me to write a bit about the Eighteenth-Century Book Tracker project that Laura Mandell linked to last week, and I’m happy to do so.

This is a project I began thinking about around a year ago, and to explain some of its premises, I’d best say a bit about the circumstances that gave rise to it. I teach at a mid-sized, primarily undergraduate public university that hasn’t purchased access to ECCO, EEBO, et. al. and, realistically speaking, isn’t ever going to purchase access to them at their current prices. I’m really fortunate to be able to use ECCO and other resources at the University of Connecticut, just a few miles up the road, so my own research isn’t unduly hampered by not having them at my home institution. (What hampers my research is my 4/4 teaching load, but that’s another matter…) I can’t really take advantage of ECCO in my teaching, though, which led me to start exploring resources like Google Books and the Internet Archive. While you can’t beat the price, those sites—and, let’s recall, they’re functionally the only ones that people without institutional access to the big databases can leverage—leave a lot to be desired.

There’s been a lot of good discussion here about the nature of Google Books and the Internet Archive—what they are and aren’t good for, how best to think about them, whether as catalogues/finding aids or as searchable textbases. I hope it won’t seem too contrary of me, then, to say that, at present, they aren’t especially good at being either of those things.

Given the problems with OCR of eighteenth-century print that Steve Karian recently reminded us of, I don’t think we can be too confident in using these cost-free sites as the foundation for the kinds of digital humanities approaches that Matthew Wilkens describes so well (what John Unsworth calls “not reading” and Tanya Clement calls “distant reading,” both in reference to Project MONK).

In a recent videocast, Laura Mandell notes—rightly, I think—that, at present, we don’t really have usable “data” for this sort of work in eighteenth-century studies because the plain text that we have is so poor (this part of the discussion appears at about 0:35). If this is a weakness of ECCO, it is an even greater one for Google and the Internet Archive, which don’t appear to have taken any special steps to compensate for the quirks of eighteenth-century print—ECCO at least offers different levels of “fuzzy search.” (I should note, though, that Google appears to be doing something on this front: I recently saw their site correctly interpret a long-s. Only one, mind you, but it’s something.) For the time being, then, Google Books and the Internet Archive—the two largest sources of freely-available facsimile texts I know of—seem like dubious textbases for digital humanities.

That being the case, it’s the more disappointing that they fall down so badly on the bibliographical finding aid front: if we can’t yet confidently use Google Books or the Internet Archive for “not reading” the eighteenth century, it would at least be nice to be able to use them to read particular eighteenth-century books. As others have already noted, though, the sites’ bibliographical shortcomings make that harder than you’d think (the Internet Archive seems to me to be a bit better than Google Books on this point, though that’s not saying a great deal).

It’s this question of bibliographical accuracy that Eighteenth-Century Book Tracker is trying to take on. The problem of “How Not to Read a Million Books,” as John Unsworth and his collaborators put it, is an important and fascinating one, and is already leading to some really interesting scholarship. There’s still a very large contingent of students and scholars, though, who don’t have a million books from which to choose, say, ten to read (much less to “not read”). Google Books, the Internet Archive, and any number of other, smaller, ad hoc digitization projects hold the promise of supplying that deficiency, if only one could find what one was looking for.

So, what Eighteenth-Century Book Tracker tries to do is to provide a clearinghouse for registering links to freely-available digital facsimiles of eighteenth-century texts, pooling its users’ discoveries and attaching them to bibliographically responsible entries. That way, someone who’s looking, for instance, for a copy of The Dunciad can know precisely what they’re looking at. (I was nonplussed, incidentally, by Paul Duguid’s account of being taken to task for “bibliographical fastidiousness” in the piece by Peter Brantley that David Mazella linked to recently. Is that a bad thing? Can one afford to be anything other than bibliographically fastidious when approaching a text like The Dunciad?) Someone who wants to see Henry Fielding’s Amelia can find all of the volumes in one place (with fair warning that those volumes are drawn from two different copies). And someone who wants to see Eliza Haywood’s The History of Miss Betsy Thoughtless can save themselves the aggravation of trying to find all the volumes of the 1751 edition (since they’re not there), while also seeing that the 1768 fourth edition is available in its entirety—that they’re not the same thing is precisely the point.

In a sense, this is primarily a bibliographical reclamation project, trying to reconstruct information we already had about these books before they were digitized, but that’s been lost in the translation into the digital medium. (That it’s even necessary to do so is unfortunate, but there we are.) I would cautiously submit, though, that there are some areas where the site can actually lead to new advances.

  1. First, there’s the fact that Eighteenth-Century Book Tracker is set up to accept links to texts in languages other than English (though I haven’t gathered very many yet). I’m not aware of resources comparable in size and scope to ECCO for other languages, so it’s my hope that people whose work is in languages other than English can take advantage of it as infrastructure for building up a library of facsimiles of texts in, say, French, Dutch, or Spanish. (The site’s set up to accept entries in seven modern European languages, as well as Greek and Latin, and I’m open to expanding that number, if there are users to do it.)
  2. Though digitization projects like Google’s (and like Microsoft’s—now defunct, but turned over to the Internet Archive) don’t seem to have been especially systematic in deciding what to scan, one still runs across some interesting finds. Google’s work at Oxford, to take only one instance, has drawn from books at the English Faculty Library and the Taylor Institution, neither of which collections seems to have been matched with records in the ESTC (this is to say nothing of the European libraries with which Google has partnerships). As of now, the site has records on around 75 copies that aren’t recorded by the ESTC, but which certainly could be. (Then, too, there are copies one finds that are listed as “unverified” in the ESTC. Some of those could be checked off this way, as well.)
  3. Having such convenient access to entire volumes turns up some anomalies that often aren’t evident from library catalogues. A copy of Conyers Middleton’s The Life of Marcus Tullius Cicero listed in the University of Michigan’s MIRLYN catalogue as the eighth edition (a copy, incidentally, not recorded by the ESTC) is a case in point. The title page of Volume III announces that that volume is from “The Sixth Edition” of 1761, which is not an edition noted in the ESTC. Now, I don’t know (and, at the moment, frankly don’t care to know) enough about Middleton’s bibliography to say whether this is a hitherto unrecorded edition or whether that’s just how Volume III of the eighth edition appeared. But in either case, it’s the sort of thing that might be worthy of a note in the ESTC. (Something similar happens in the case of a 1713 edition of Addison’s Cato, apparently scanned from a copy at Oxford’s English Faculty Library. The text proves to have been bound with four responses to Addison’s play, none of which are recorded by Oxford’s OLIS catalogue. These copies are, we could say, doubly lost to the ESTC: once, because no matching seems to have been done with that library’s collection, and twice, because a simple matching of catalogue records wouldn’t reveal the presence of the texts.)
  4. Again, because Google doesn’t appear to be making any effort to avoid duplication of editions scanned at different partner libraries, one is sometimes able to compare multiple copies of a work. In at least one case, the tenth volume of the 1745 edition of Swift’s Miscellanies, one can see that the copies are from two different printings, something I don’t believe is noted in the Teerink-Scouten bibliography (though, given the complexity of the entries for the various editions of the Miscellanies, I may simply be overlooking something.) [UPDATE: Steve Karian informs me, via email, that this variant isn’t, in fact, noted in Teerink-Scouten, but is described in Leland D. Peterson, “A variant of the 1742-46 Swift-Pope Miscellanies,” Papers of the Bibliographical Society of America 66 (1972): 302-10.]

Thus, while Eighteenth-Century Book Tracker is meant primarily as a finding aid, and not as a bibliography, per se, it may still be able to yield some collateral bibliographical gains.

There are just two other things I’d like to mention briefly about the site. The first is that I think that one potential use for the site is as a teaching tool. I’ve had some success in getting undergraduates to identify eighteenth-century editions at Google Books using the ESTC. In a sense, precisely because Google Books is such a bibliographical mess, it provides a good occasion for directing students’ attention to matters of bibliography that I would never have attempted to explore with them otherwise. With the right coaching, undergraduates can actually do quite a lot in this line—in the course of two semesters, my students have brought me a little over 200 links (not all of which I’ve had time to enter into the database yet). So I think there are real opportunities for involving students in the creation of a valuable scholarly resource, even as we teach them about questions of print culture and book history that I don’t believe undergraduates are often exposed to.

The last thing I’ll say is that I think this bibliographical reclamation task, as I’ve called it, can play a role in the eventual development of more ambitious digital humanities projects. Right now, Eighteenth-Century Book Tracker is just trying to provide the missing finding aid that’s needed to read digital surrogates of eighteenth-century books that are available at sites like Google Books and the Internet Archive. But finding the books is a necessary first step for any subsequent work—you can’t “not read” the books effectively if you don’t know where they are. The recent announcement that 18thConnect will attempt to re-index page images provided by Gale using a bespoke, optimized OCR routine is, I think, a very positive development. Even those who don’t have institutional access to ECCO will, it appears, soon be able to search the texts using what promises to be a much cleaner textbase than is currently available. Once you know that a word or phrase appears on a particular page in a particular book, do you really care whether the page image comes from Gale or from Google? To take advantage of exciting developments like this one, though, and to ensure that the benefits of those developments extend to students and scholars who don’t have access to tools like ECCO, we have to have a way to find the books and know what we’ve found. That’s where I hope Eighteenth-Century Book Tracker will fit in.

I’ve gone on much longer than I intended, so I’ll simply close by saying that I welcome feedback about the site. It’s still a work in progress, and I would very much appreciate hearing thoughts about how the site could be developed so as to be most useful.

8 Responses to “Eighteenth-Century Book Tracker”

  1. Anna Battigelli Says:

    Thanks, Ben, for the lucid overview of Eighteenth-Century Book Tracker. Your project is a cherished resource, especially for those who do not have access to EEBO or ECCO.

    I would be interested in hearing more about how you prepare undergraduate students to identify the specific eighteenth-century editions that have been digitized. You make this sound like a fairly fluid process. Is this the case?

    AB

    Like

  2. Eleanor Shevlin Says:

    Ben,

    Yes, many thanks for this description of your project. Not only does your post integrate so effectively many ideas and essays discussed previously in EMOB posts, but it affords a much better understanding of the power of your project than I had obtained from poking around on the site on my own.

    Your findings are quite impressive. I’m especially struck how your systematic approach has uncovered a number of items not recorded in ESTC (what surprises me is that so many unrecorded items are a part of GBS–not that there are unrecorded items). Frankly, I have virtually ignored GBS’s 18th-century holdings. Your work casts another light on this tool for me.

    That you designed your site to accommodate other languages is also noteworthy and shows foresight. (You might consider writing a piece about your project for SHARP News, the quarterly newsletter for the Society for the History of Authorship, Reading & Publishing; its membership is international and I suspect members would be interested in hearing about your work.)

    Like Anna, I would also like to hear more about your work with undergrads to prepare them for assisting in this project. The pedagogical aspect of the 18th-Century Book Tracker is yet another real strength of this work.

    On a somewhat related note, are you concerned about the proposed subscription plan for universities to have access to GBS that appears in the proposed GBS settlement?

    Like

    • Benjamin Pauley Says:

      My method—such as it is—for prepping students on using the ESTC hasn’t been especially advanced. I’ve set aside time at the beginning of the semester to walk students through ESTC entries (pay attention to statements about editions, as well as publication years, but also have a look at notes on signatures, page numbers, and characteristic variants or press errors), and have put together a few screencasts to reinforce the points (which I need to update and place on the Tutorials page at the site).

      A lot of the work has come from one-on-one work with students as they got ready for presentations. They’d email or stop by my office hours with a link and an idea about which ESTC entry it might be, then we’d have a look to see how they’d done. (I’ll also confess that were some cases—a pirated edition of The Beggar’s Opera comes to mind—where I just told the student, in effect, “Here be dragons… you’ll have an easier time looking for a different book to discuss.” Cowardly, perhaps, but some bibliographical questions are just too harrowing.)

      I had tried to post a reply about the question of Google’s licensing arrangements under the settlement in another thread, but WordPress ate my homework. From what I’ve been able to tell (and I’ve tried to keep an eye on this, for obvious reasons), the settlement won’t have any effect out-of-copyright works, which will apparently continue to be freely available to all users. The really audacious coup for Google, as I gather, lies in being the sole licensee for “orphaned” works—texts that are still in-copyright, but are out of print: this is going to be the meatiest part of the one-terminal-per-library arrangement, I suspect, as they may prove to be the only game in town for accessing such works (including, probably, lots of scholarly monographs that were only ever bought by a small handful of academic libraries.)

      Like

  3. Anna Battigelli Says:

    One thing you’re demonstrating is how much can be done with freely available tools such as GBS and ESTC. Like Eleanor, I had not done much with eighteenth-century texts in GBS. But looking through your site, I see better the eighteenth-century texts GBS provides.

    I don’t think that shunning the case of the pirated Beggar’s Opera is cowardly at all. The last thing you want is a false identification. This raises the question of quality control. From what you have said, it seems to me that you review your students’ identifications before including them on the site. Is that correct? If so, that’s an enormous amount of work–and probably necessary.

    Like

  4. Eleanor Shevlin Says:

    I too think your handling of situations such as the pirated Beggar’s Opera is only sensible.

    As for the settlement, I do know that there will be full access to books in the public domain. A problem I have observed is that some new print-on-demand firms are taking out-of-copyright works and creating “new” editions–and thus placing them ostensibly under copyright again. Works published by Kessinger (one example of several such companies) that result in snippet or no previews are sometimes available as full-text in another edition—but their publications seem most worrisome for what they might portend for current public domain works.

    The quality of these works are often quite poor as this Kessinger disclaimer for its A Descriptive Account Of The Literary Works Of John Britton From 1800 To 1849: Being A Second Part Of His Autobiography suggests:

    This scarce antiquarian book is included in our special Legacy Reprint Series. In the interest of creating a more extensive selection of rare historical book reprints, we have chosen to reproduce this title even though it may possibly have occasional imperfections such as missing and blurred pages, missing text, poor pictures, markings, dark backgrounds and other reproduction issues beyond our control. Because this work is culturally important, we have made it available as a part of our commitment to protecting, preserving and promoting the world’s literature.

    .

    Moreover, as this Amazon customer review demonstrates, these firms at times turn to Google for their copies:

    If publisher has to apologize for quality of reprint I don’t need., April 30, 2008 By Stephen L. Powell

    Let it be known that Kessinger Publishing sourced this reprint from Google Books. Every page has Google printed at the bottom. Why they chose this particular edition to reprint eludes me. It isn’t the most complete or a first edition or the most beautiful or cleanest scanned or superlative in any way. It has all the faults of the original scan such as underlined words, plus some defects of their own printing process added. Also the binding appears to be just glued together.

    .
    On Amazon this work is tied to the wrong edition–see GBS Works of Alexander Pope for the right one–and note its copyright date of 2007 (and price of $39.95 if one clicks on Amazon)!

    To see how similar works by Pope published by Kessinger appear on GBS, click here

    Like

    • Benjamin Pauley Says:

      This is very sleazy, indeed. I’m not sure if this is better or worse than the “editions” of eighteenth-century books for sale at Amazon that are just print-outs of texts downloaded from Project Gutenberg (several of them here, for instance). Must be to-die-for profit margins, though.

      Does anyone know the intellectual property status of these things? I seem to recall reading somewhere that you don’t create new intellectual property through the mere act of photographic reproduction (so Google’s scans, as I understand it, don’t give Google proprietorship in the texts). If that’s the case, Kessinger’s claims of copyright (on a printing of somebody else’s photographic reproduction) seem pretty thin.

      Like

      • Eleanor Shevlin Says:

        Yes, it does seem exceptionally sleezy. I’m sure some of these same firms are the ones printing copies from Project Gutenberg.

        As for copyright, you are right about GB scans. Also, to be fair, I am less certain whether Kessinger is actually claiming ownership–though the inability to view works or the limited view offfered of such works and other aspects of their works suggest they are. While on one hand their claim would seem fairly tenuous, on the other hand there is Dover books as a precedent.

        When one clicks on the copyright button for the Works of Pope title published by Kessinger on Amazon’s page, a disclaimer comes up explaining that the information being displayed is for an edition by BiblioLife (which reprints works–but keeps them in the public domain–profits go back to producing more open access works) and not the Kessinger edition…

        Like

  5. Digital Projects at SHARP 2015–Part I | Early Modern Online Bibliography Says:

    […] topic. As an aside, Benjamin Pauley’s Eighteenth-Century Book Tracker (see prior emob post, post, and post) is now being phased out, and its information being incorporated into the English Short […]

    Like

Leave a reply to Eleanor Shevlin Cancel reply