A Debate over the Usefulness of Google Book Search

by

I don’t have a whole lot to say about this, but I thought it was interesting to see a number of “New Media” scholars debating the decisions Google made while putting together Google Book Search.  When we talk about the proprietary role that a company like Google has on such an important chunk of our information environment,  it seems important for scholars and librarians to be able to give some direction and feedback.  So far, this does not seem to have been the case.  DM

37 Responses to “A Debate over the Usefulness of Google Book Search”

  1. Anna Battigelli Says:

    Thanks, Dave, for posting this. Clearly tensions exist surrounding Google’s approach to GBS. I would have liked to have read a fuller account of the exchange.

    On the one hand, GBS can advertise itself as advancing democratic ideals about reading and accessibility to books. On the other hand, as some of the members of this debate make clear, GBS seems to be ignoring and thus erasing a century’s worth of good bibliographic work. Perhaps literary scholars need to make a stronger case to the general public for the value of the work they do, and particularly for the value of bibliographical work. . .

    In a related post, Enfilade’s “the eighteenth-century in the news” noted that Amazon.com and the University of Michigan “will make over 400,000 rare books available through softcover reprints, ranging in price from $10-45.”

    AB

    Like

    • Eleanor Shevlin Says:

      Anna,

      Thanks, Anna…this agreement between Amazon and UM is very interesting–and seems to be an effort by academic libraries to protect their holdings as potential source of revenue flow (my gut reaction–perhaps I’m being too cynical!). I do wonder if copyright in these POD will be claimed… I imagine it would. There have been more than few firms that have emerged that have published formerly out-of-copyright texts (sometimes using the full-text version available on Google Books!) and then established new rights to the work. These copies then show up on Google as only available as “snippet views”.

      Like

  2. Eleanor Shevlin Says:

    I’ve mentioned before that I see GB, in its present incarnation, as providing quite a different function from being an online library; to me, it’s a finding aid. In fact, GB seems to backtrack on this claim in its FAQ’s even as it often put it forth:

    The aim of Google Book Search is to help you discover books and learn where to buy or borrow them, not read them online from start to finish. It’s like going to a bookstore and browsing – with a Google twist.

    Readers may be interested in the Google Exchange that took place two Augusts ago between Paul Duguid and Patrick Leary.

    Duguid’s article that inspired this exchange: Inheritance and loss? A brief survey of Google Books dealt with Sterne’s Tristram Shandy.

    Later I would like to discuss the implications of the proposed settlement that GB reached this past fall.

    Finally, one might be interested in the podcast of yesterday’s portion of NPR’s Talk of the Nation: Science Friday, Digital Life
    Who Really Owns Your Digital Data?

    Like

  3. Anna Battigelli Says:

    Thanks, Eleanor. Both the Duguid/Leary exchange and NPR’s Science Friday session are of interest.

    There seems to be an almost irreconcileable tension between those who are delirious about GB’s keyword search potential and those concerned by the lack of bibliographical tidiness of GB’s digitized texts.

    For an enthusiastic summary of the promise of GB keyword searches, see the youtube video by two UC Berkeley students, http://www.youtube.com/watch?v=m7s8q9nTQh4.

    Like

  4. Eleanor Shevlin Says:

    Thanks, Anna for the YouTube piece. Google Book has posted this and similar user testimonials on its pages. While I definitely agree with those who wish to see greater attention to bibliographical matters on Google Book, this emphasis sometimes ignores the type of uses for Google Book is in its present incarnation.

    I am concerned about the effects of the Google Settlement and fully understand why Bob Darnton opted out of the agreement for Harvard. That acdemic libraries will need to subscribe and also supply detailed reports is just one aspect of what troubles me. I am including two summaries of the settlement, one from the ALA website, and another 10-page summary from attorney Joy Butler.

    Like

  5. Anna Battigelli Says:

    I would like to hear more about your concerns regarding the Google settlement. Are they similar to those expressed by Siva Vaidhyanathan in The Googlization of Everything?

    Libraries at public universities all over this country (including the one that employs me) have spent many billions of dollars collecting these books,” he wrote. “Now they are just giving away access to one company that is cornering the market on on-line access. They did this without concern for user confidentiality, preservation, image quality, search prowess, metadata standards, or long-term sustainability. They chose the expedient way rather than the best way to build and extend their collections. I am sympathetic to the claim that something is better than nothing and sooner is better than later. But sympathy remains mere sympathy. These claims are not convincing when one considers just how great an alternative system could be, if everyone would just mount a long-term, global campaign for it rather than settle for the quick fix.

    Like

  6. Eleanor Shevlin Says:

    Siva Vaidhyanathan’s main complaint is with the failure of libraries to take charge of their collections in terms of digitization–and I definitely think his concerns have merit. His position predates the Google settlement by quite a while.

    However, my concerns about the Google stem from the settlement specifically.

    For example, one will no longer be able to search copyrighted material using Google Book from home/remote locations (though if one is part of an academic library one might be able to have access as we do with ECCO, etc.).

    One’s academic library will also need to subscribe to Google Book.

    The reporting required that libraries do by the settlement seems onerous.

    Public libraries will be allowed ONE designated terminal to search Google Book .

    And these are just some of my worries…

    Like

  7. Anna Battigelli Says:

    Subscribing to Google is bad enough; that returns us to the problems we have with EEBO, ECCO, and Burney, which are simply inaccessible to many scholars. But one designated terminal? This is not in the least bit practical. This would, I imagine, require timed sessions.

    This technology diminishes in value once its accessibility is restricted.

    Like

  8. mandellc Says:

    Dear All:

    I’m coming late to this discussion, and have many misgivings about the planned google research platform, but I did want to mention the crucial work being done by Benjamin Pauley in adding ESTC numbers and other metadata to 18th-century materials available in Google Books. He wants us all to contribute, and here is the URL:

    http://nutmeg.easternct.edu/~pauleyb/c18booktracker/

    (forgive me if someone mentioned this already: I think it was advertised on C18.)

    Like

  9. Eleanor Shevlin Says:

    Laura,

    Many thanks for the information about Benjamin Pauley’s work. I had not heard of his initiative (I actually have C18 on sleep right now), but his site demonstrates that he is already making solid strides. I enjoyed browsing his library, and it gave me a better idea of what Google Books has in the way of 18th-century works.

    I actually do not use Google Books very much in terms of 18th-century texts–though I have found it helpful for the digitgal images of plates/illustrations found in 18th-century works; they tend to offer clearer images than ones found in ECCO.

    As mentioned in another comment, I also have serious worries about the new google plans.

    Like

  10. Eleanor Shevlin Says:

    For those interested, I am posting a link to Charles Bailey’s most recent Google Book Search Bibliography.

    Like

  11. Anna Battigelli Says:

    I added Eighteenth-Century Book Tracker to our “Projects, Panels, and Societies” resources. It looks like a very useful site.

    Like

  12. Anna Battigelli Says:

    Robert Darnton’s NYTimes review of the GB settlement was followed by responses. His arguments seems far more convincing to me than the responses: http://www.nybooks.com/articles/22281.

    Missing the opportunity to use public money to fund a similar venture to GB, but with the result of creating a national and open-access database is unfortunate.

    Like

  13. Dave Mazella Says:

    Point taken, but public moneys can be as irregular and unreliable as university funding or corporate money. I’ve been following with interest this thread in Sharon’s EMN, which discusses the end of free access to the RHS Bibliography of British and Irish History:

    http://www.earlymodernweb.org.uk/emn/index.php/archives/2009/07/bad-news-for-british-and-irish-historians/

    The crucial point in the comments comes from Tim Hitchcock, who says this:

    “This outcome is a result of the failure of the funding bodies to address how to sustain some of the great sites they paid to create. There is no model, no mechanism, no clear thinking. What the RHS has done was actually an imaginative response to an impossible situation – the alternative was to let the bibliography die completely, or be mothballed – destroying ten’s years work by a whole team of people.

    Simply bemoaning the decision is not enough – we need come up with a way of funding the sites the community think are worthwhile – in the full acknowledgement that even programmers, bibliographers and historians need to eat occasionally.”

    As far as I can tell, the only really viable option for sustainability would be some scheme of fund-raising and development that would give such a project independence from the budgetary ups and downs of hosting institutions or the feds. And no I don’t think it will be easy. But I think this question of sustainability will be the real test for the continued growth of digital humanities.

    DM

    Like

  14. Eleanor Shevlin Says:

    The discussion about funding reminds me of the remarks made by John Haeger, the Vice President for Programs and Planning, Research Libraries Group (RLG), at the one-day conference on the English Short Title Catalogue (ESTC) held at the New York Public Library (NYPL) on January 21, 1998. Here’s a summary of his remarks from a conference report that I wrote for the Intelligencer at the time:

    Foremost, Haeger argued that the system of professional review that characterizes governmental awarding of funds creates the only environment that may be hospitable to a project such as ESTC. The need to preserve this environment is one reason we should be concerned about maintaining and restoring government funding of the arts and eduction. Although government sources supplied only about one third of the roughly thirty-million dollars spent on the ESTC over the past twenty years, the project probably would not have been done, Haeger contended, if government grants were not forthcoming. Thus, Haeger urged that instead of spending time on raising funds and slashing budgets to compensate for loss of government aid, we should assume a far more pro-active stance: To restore NEH funding would take only a $0.38 contribution per taxpayer–and just another $0.10 per taxpayer to bring the NEH budget back to its 1980s levels.

    I remember being quite struck by the dollars-and-cents amount–just $0.48 cents per person (even in 1998) seemed something that we could do as a nation.

    On a different note, I have been wondering about how the U.K. has been handling access. I have seen that a body called JISC Collections, “established by the UK further and higher education funding councils in 2006 to negotiate with publishers and owners of digital content, has purchased the content on behalf of all UK higher eductation institutions”:

    The JISC has made a significant investment in licensing the content of this vast collection. The content is available to institutions free of charge as JISC has purchased the content in perpetuity on their behalf. However, institutions do need to pay an annual hosting fee to the publisher, if they wish to access the content via the Cengage Learning servers, taking advantage of the search interface developed for ECCO.

    I am unsure about how UK institutions would access this content if not subscribers. On one hand the JISC seems to be a national lobbying arm/agent that shares some traits to the regional US library collectives that I have commented on elswhere in this blog. Yet, a key difference seems to be that JISC has purchased the content. Given the nature of the US entities, these bodies have worked on negotiating better fees for their member institutions–they would not be able to purchase content for their members even if desired. The US Congress does have an office that negotiates price and actually purchases digital databases for members of the House and Senate–but that’s a different matter.

    Like

  15. Ben Pauley Says:

    I’m away from home at the moment on vacation with my family, so I haven’t been able to read through all of the linked materials mentioned in this thread and others on the blog (but I’m really looking forward to doing so on Monday).

    I did want, though, to invite feedback on the site to which Laura Mandell linked. The site grew out of a project I started working on with students in my classes. My university doesn’t have (and won’t likely ever be able to afford) resources like EEBO, ECCO, Eighteenth-Century Journals, etc., so I turned to Google Books in search of materials for class. Once I started looking around, though, I was struck by the bibliographical disarray of the thing. (A state of affairs others have remarked, of course–I’m keen to read the exchange David Mazella linked to more closely, but a quick scan suggests that Paul Duguid’s search for volumes of Tristram Shandy parallels my own experience).

    I set my students to the task of finding eighteenth-century editions at Google Books and identifying what they’d found using the ESTC as a way to direct their attention to questions of book history, but also with a hope of getting a start on something like the site as it now stands. I consider the database as it stands now still a work in progress, though, and I’d be very interested to hear thoughts about where a project like this one might fit into a larger landscape to which I’m still orienting myself.

    Like

    • Eleanor Shevlin Says:

      Ben,

      Thanks so much for writing from the road, and we look forward to hearing more from you when you return from vacation.

      Although your project has been primarily a one-person operation, you have already managed to provide ESTC identification for an impressive number of texts–and I find your database easy to use.

      I haven’t used Google Book search that much to access eighteenth-century texts (Among other uses, I use GB as a finding aid to search for primary and secondary resources and historical information for people, places, ideas/events that are not always indexed in print sources). Yet, when I have come across an 18th-century text, I often then go to ESTC and WorldCat–and eliminating that extra step is only one example of the value of your work.

      I look forward to hearing more about your work and plans–and perhaps contributing to your efforts if I could be of help.

      Eleanor

      Like

  16. Anna Battigelli Says:

    Eleanor, I only now had a chance to look at Charles Bailey Jr.’s bibliography of articles on Google books and found it very helpful. Thanks.

    One thing this thread makes clear to me is the need to take students’ use of GBS into account in my teaching. I have been away from teaching during this sabbatical year and will be interested in seeing whether student research habits have changed significantly during my absence.

    Like

  17. Eleanor Shevlin Says:

    Anna,

    I would be very interested in hearing what you find. Please let us know.

    In my experience at WCU, undergraduate students use Google quite a bit, but they have not gravitated in large numbers to Google Books. This pattern may be changing, though, for I did find that one or two students tapped GB to “access” books offering secondary criticism that I had mentioned in class. Of course, GB typically will not offer the type of access that will truly allow the student to read chapters in their entirety. I ended up giving a tutorial on GB and explaining its uses–and the need to obtain a hard copy of the book and read more than just the snippet or limited page preview if citing to support their arguments or the like.

    I didn’t teach a graduate course last year, but I did give an hour -workshop on GB to graduate students last fall. WCU grad students were using Google Scholar quite a bit, but they weren’t using GB really at all.

    Like

  18. Dave Mazella Says:

    My experience is that students will use the Google search engine, but for the most part they leave alone Google Scholar or Google Books. One of the issues is that our library is pretty small, and so GBS would entail Interlibrary Loan.

    My experience is that with a few exceptions, most of my students’ information literacy is pretty minimal prior to the intro to lit studies course, which is our intro to the major, and which (in my section) contains a lot of IL instruction. But I spend a lot of time teaching them about the scholarly sources we subscribe to, and not so much resources like GBS. It might be interesting to incorporate this into the capstone course I teach, though.

    DM

    Like

  19. Eleanor Shevlin Says:

    I have been meaning to mention the Internet Archives and its sub-project Open Library.

    The poetical works of J. Armstrong offers an example of an eighteenth-century work prepared by the Internet Archive/Open Library. You will see that you have various options for viewing the book (sometimes Internet Archive will take you to the digital copy in Google).

    Like

  20. Anna Battigelli Says:

    It looks as if anyone can edit an item’s entry from the Open Library. Is that a problem or a strength? Is Open Library’s catalogue better than searching Widener Library’s Hollis or WorldCat? I see that Open Library sometimes provides the digitized text of the book, which is great.

    I like the idea of an Open Library. But this seems very much a work-in-progress. Am I missing something?

    Like

  21. Eleanor Shevlin Says:

    You’re right, Anna–Open Library is still very much a work-in-progress, and in its early stages. I may be wrong, but I believe its impetus stemmed from a desire to provide an alternative to Google’s digitization and GB’s search engines and their control of information.

    Open Library does allow editing by all–which has drawbacks (issue of quality control, too many duplicates, etc.) as well as the strengths of being truly open access.

    I use WorldCat/OCLC daily–but I turn to Open Library when I want an electronic text of a physical work not readily accessible. Often these works are early 19th-century ones.

    Internet Archive is funded by Mellon and Open Library, partially, by California State libraries, and given California’s budget problems, I suspect that this project’s developemnt will be slowed–although it is constructed through volunteer labor.

    Like

  22. Stephen Karian Says:

    I use the Internet Archive quite often, and in fact prefer their pdfs to those at Google Book Search. (I should note that distinguishing between digitized books on the Internet Archive and those on GBS is not that clear-cut since people can post links to GBS items on the Internet Archive, but not vice versa.)

    Two reasons I like the pdfs from Internet Archive better than those on GBS:

    1) image quality: books digitzed by Google are in low resolution black and white, whereas those done for the Internet Archive are in higher resolution color (here I’m referring to the scanning done by Microsoft before Microsoft’s book-digitization project went under; Microsoft worked with the University of California system, University of Toronto, etc., and I believe that all their stuff is part of the Internet Archive, though the Internet Archive has other stuff as well). As a result, I find that the pdfs in the Internet Archive are easier to read.

    2) full-text searching: You can search full-text in Google only when online. That is, if you download the pdf, you cannot search the full text of that pdf. But Microsoft’s pdfs created for the Internet Archive contain the full-text metadata within the files, which means that you can search full-text using Adobe Acrobat Reader even when you’re not online.

    Because of both factors, the file size for items created by Microsoft are much bigger than those created by Google. But storage capacity is not as much of a concern as it used to be.

    Like Eleanor, I often use both Google Book Search and the Internet Archive for 19th-century and early 20th-century books that our library does not own (and sometimes when it does own them). I have saved a lot of time and hassle in recent years by making far fewer inter-library loan requests.

    Like

  23. Eleanor Shevlin Says:

    Steve is absolutely right about the advantages of PDFS/digital books in Internet Archive (and why–the scanning done in color, the project participants, etc), and I especially tend to seek out texts here if images/plates are involved.

    And, yes, IA has incorporated Google digitized texts, and in these cases, it will take you there. I *believe* IA started incorporating GB texts in late 2007–but I may be wrong about that date.

    As for size, one often has the option of downloading a smaller, B/W PDF if desired.

    While I tend to download Internet Archives texts, I rarely do so for GBs in part because of the inability to search off line. If I want the text (when available through full-text view/PDF option) in GB, I save the page(s).

    Like

  24. Anna Battigelli Says:

    I see Steve’s point about the clarity of the digitized pages on the Internet Archive. The use of color does make a difference. Diana Kichuk argues for the value of color over black and white in her article on our bibliography, and the examples she provides (which are not from the Internet archive) are convincing. [See her “Metamorphosis: Remediation in Early English Books Online (EEBO),” Literary and Linguistic Computing 22:3 (2007), 291-303.]

    I also like the IA’s page-turning feature, something the British Library uses well on its home page. Among the exhibits on the BL site is the manuscript of Jane Austen’s History of England. The digitized page turning feature not only preserves the feel of a book (or in the case of Austen, the bound manuscript); it also preserves the book/manuscript. I don’t know that such a feature is necessary, but it’s helpful to reminded how a book or operates.

    Like

  25. Eleanor Shevlin Says:

    The IA’s page-turning feature, its use of colored images, and similar effects all reinforce the notion of the digitized copy as a surrogate for the physical book (and these features also exhibit IA’s concern for and attention to the book as a material artifact).

    IA also provides different forms of/ways to access the electronic works. This diversity arguably acknowledges (as it facilitates) the different uses that electronic works can serve.

    GB Search, in contrast, seems to emphasize the “book” as data. The search results appear initially as window-boxes of selected texts (though if the full page is often available if one wishes to click on the box). The “old” interface (pre-June 15th or so) would display the various occurrences of the search term surrounded by very limited text in a left-hand column. The snippet views were framed by a ragged border as if the text had been ripped from the book–and the scrap is of course all one can see.

    Like

  26. Reading with Machines « Early Modern Online Bibliography Says:

    […] note of particular interest to those who care deeply about bibliography. In an earlier post about Google Book Search (a service tellingly renamed from the original Google Books), there was some debate about whether […]

    Like

  27. Benjamin Pauley Says:

    I’ve tried my best to stay abreast of developments on the Google Books settlement (as you might expect) and, from what I can tell, the concerns about Google’s licensing of the service to libraries needn’t extend to eighteenth-century primary works (or, indeed, to anything not in copyright): those works will, they say, continue to be available for anyone to download in their entirety, as they currently are. (There’s plenty else about Google’s reach in digitization, of course, to give anyone the heebie jeebies, and Vaidhyanathan articulates those concerns forcefully—see also his blog, The Googlization of Everything, for a comprehensive, if frankly polemic, take—but at least the eighteenth-century texts should continue to be open and accessible.)

    I was excited when I first saw that scans at the Internet Archive were in color (mostly ones by Microsoft, I think), but my enthusiasm waned a bit when I started trying to zoom in: the scans aren't of sufficient quality to see the details of the paper, so it's not possible to see things like chainlines for purposes of puzzling out certain tricky matches with ESTC—just my particular hobbyhorse, of course. I don't have a strong opinion yet about legibility for purposes of onscreen reading, but it's something to consider. (Incidentally, for a really amazing example of an attempt to facilitate the study of the book-as-object through digital technology, have a look at Codex Sinaiticus. Probably not sustainable for a large-scale digitization project, but… wow.)

    Like

  28. Eleanor Shevlin Says:

    Thanks, Ben… the Codex Sinaiticus is amazing.

    Your points about the inadequacies of GBS, IA, ECCO, etc. are absolutely on target. Such shortcomings, however, speak to the purpose and vision of the projects–and the lack of involvement of bibliographers and book historians. Your Book Tracker is helping make GBS and other freely accessible, online electronic editions far more usable. (Though the images, etc., ensure that cannot serve ultimately for the phyiscal book.)

    For comments about threats to out-of-copyright 18th-century (and beyond) works and GBS settlement, see my latest response to your Book Tracker post.

    Like

  29. Benjamin Pauley Says:

    I’m not sure exactly where this news fits into the discussion of Google Books, but they’ve just announced that authors (or other “rightsholders”) can distribute their works under Creative Commons licensing on Google Books.

    Whether this will have any practical payoff for people working in eighteenth-century studies isn’t yet clear to me. Eighteenth-century primary texts, of course, are already public domain. If university presses decided, out of the goodness of their hearts, to make their backlists available under creative commons, that’s certainly be something: one could download out-of-print works in their entirety.

    I can’t see why any press would do that, though: it would seem that part of the promise of the Google Books settlement is to provide an outlet for monetizing their backlists (I hate that word, “monetize”), exploiting the so called “long tail” by actually being able to sell “on demand” a few copies of texts for which there wouldn’t be sufficient demand to justify a print run.

    Like

  30. Eleanor Shevlin Says:

    The topics of distribution and marketing seem to be relevant to GBS. While Google has a number of testimonials of on its site about the way GBS has helped increase sales–especially of backlisted titles–independent sources/publishers have also reported that GBS (and Amazon’s “look inside”) have helped increase sales.

    For example, a 2007 article in Book Business reports that

    Paul Manning, vice president of book publishing for Springer, attributes much of the recent growth of the company’s back catalog of older titles to its participation with the controversial program. With more than 30,000 titles available in Google Book Search, the publisher saw more than 1 million views in a one-month period, and 20 percent of its “buy this book” clicks on the search were for titles older than 10 years old, he says. Manning, 42, talks about the advantages of using Book Search and viral marketing to boost future book sales.

    .

    I am not surprised that GBS has helped increase sales–I know that I have either purchased books or asked my library to do so because of works I found that I might not otherwise have known about. The abilitiy to search the book convinced me that it was one that I needed. I would think that GBS could be very helpful in creating increased sales for back-listed titles from university presses.

    That said, at last year’s MLA I sopke to someone in MLA’s publishing division, and she was fairly suspicious about these claims.

    I’ve also heard that publishers are also beefing up their online offerings of online texts that offer limited searches to help marketing efforts of their titles.

    Like

  31. Anna Battigelli Says:

    An ongoing and interesting discussion of the GB settlement, with Matthew Wilkens’ positive take on the settlement, can be found at Workproduct.

    Like

  32. Eleanor Shevlin Says:

    Anna’s notice about the ongoing discussion on the GBS settlement on Workproduct has prompted me to provide the link to one of several reports on the GBS conference that took place at Berkeley a few Fridays ago. I had planned to post this as a new posting, but I did not want to distract attention from our discussions of ECCO searching. As I’ve noted elsewhere, I find GBS, despite all its warts, an extremely useful tool for scholars–especially when approached with a good understanding of its limitations and capabilities.

    Google Books Settlement Con Is Google Book Search the last library?

    Geoff Nunberg, one of America’s leading linguistics researchers, laid this rather ominous tag on Google’s controversial book-scanning project amidst an amusingly-heated debate this afternoon on the campus of the University of California, Berkeley.

    “This is likely to be The Last Library,” Nunberg said during a University conference dedicated to Google Book Search and the company’s accompanying $125m settlement with US authors and publishers. “Nobody is very likely to scan these books again. The cost of scanning isn’t going to come down. There’s no Moore’s Law for scanning.

    “We don’t know who’s going to be running these files 100 years from now. It may be Google. It may be News Corp. It may WalMart. But we can say with some certainty that 100 years from now, these are the very files scholars will be using.”

    The day-long conference program is available.

    Like

  33. Eleanor Shevlin Says:

    The Washington Post has an article on the DOJ’s recommendation yesterday that the Court rejects the Google Book settlement because the agreement could violate copyright and anti-trust laws.

    Tom Krazit of CNET also has a piece on this news, DOJ: Google’s book settlement needs rewrite, which offers a PDF of the entire DOJ filing.

    Like

Leave a comment