Archive for the ‘Google Book Search’ Category

Digital Public Library of America (DPLA) to open April 2013

March 30, 2012

By April 2013, the Digital Public Library of America should be up and running.  With this announcement, Robert Darnton opened a recent talk about DPLA sponsored by Harvard Library Strategic Conversations.

Darnton reviewed DPLA’s brief history, including its origin at a meeting at Harvard’s Radcliffe Institute on 1 October 2010, its successful coalition of foundations committed to providing financial support, its appointment of a steering committee, and its selection of John Palfrey as the steering committee’s chair.  Six “workstreams” have been designed to arrive at consensus-driven plans in the following areas:

To join a workstream listserv, consult the appropriate web page.

Darnton insisted that DPLA was not simply a response to Google, though DPLA is open to working with Google and has extended invitations to that effect.  He provided an incisive history of Google Book Search’s legal troubles, and noted that DPLA has much to learn from that history.

Next, John Palfrey (chair of the DPLA steering committee, and author of Born Digital: Understanding the First Generation of Digital Natives), outlined some of DPLA’s goals, though he conceded that the exact nature of the DPLA was still be determined:

  • constructing a creative and technologically sophisticated learning environment beyond that created by e-books.  This involves imaginative work by architects, programmers, catalogers, users, and and just about anyone else prepared to think innovatively.
  • considering the following elements that will shape the still indistinct and ever-evolving nature of DPLA:
    • code will be free and open source
    • metadata will aggregate existing data and create additional data.  It has already arrived at an agreement to network with Europeana, Europe’s digitized knowledge-sharing platform.
    • content will include all media types
    • tools and services will facilitate public innovation.  Palfrey provided as an example the use of a “scanabego,” a truck with scanning tools that would be driven across the country to local historical societies, offering to digitize their records in exchange for linking those records to DPLA.
    • DPLA’ community will be widespread and participatory.  According the DPLA web site, “DPLA will actively support the community of users and developers that want to reuse and extend its content, data, and metadata.”

In the discussion that followed the presentation, one of the most interesting comments was Charles Nesson’s description of a Digital Registry Project to address the copyright issues that plagued Google Books.  The Registry seeks pro bono commitments from major law firms “to defend the copyright status determinations of major cultural institutions such as libraries and universities” (see the memo available on Charles Nesson’s web site.)  According to the DPLA web site,

The objective of the Digital Registry Project is to create a comprehensive registry to undergird digital exploitation of intellectual property—for personal, educational, or commercial use. This vision encompasses all copyrighted works, all orphan works, and all works in the public domain. The Digital Registry seeks to kick start the registry process by beginning with those works that belong no one and therefore belong to everyone: the public domain. This registry is intended to be a simple and unassailable starting point for all larger registries.

More information is available at the extensive and carefully designed DPLA web site and the DPLA blog, which is guaranteed to interest emob readers.


ASECS 2011 Sessions on Electronic Resources and Related Topics

February 16, 2011

Below are sessions related to the digital humanities, electronic resources, or book history at the upcoming annual meeting of the American Society for Eighteenth-Century Studies in Vancouver.  If you would like a session included in the list below, please let me know.

8-9:30 Thursday, March 17

9. “Media Technologies and Mediation in Intercultural Contact”

(Roundtable) Pavilion Ballroom D

Chair: Scarlet BOWEN, University of Colorado, Boulder

1. Mary Helen MCMURRAN, University of Western Ontario

2. Neil CHUDGAR, Macalester College

3. Jordan STEIN, University of Colorado, Boulder

9:45-11:15 Thursday, March 17

19. “Scholarship and Digital Humanities, Part I: Editing and

Publishing” (Roundtable) Grand Ballroom BC

Chair: Lorna CLYMER, California State University, Bakersfield

1. Timothy ERWIN, University of Nevada, Las Vegas

2. Christopher MOUNSEY, University of Winchester

3. Eleanor SHEVLIN, West Chester University

4. Christopher VILMAR, Salisbury University

23. “Britain 2.0: The New New British Studies?” (Roundtable)

Chair: Leith DAVIS, Simon Fraser University Cracked Ice Lounge

1. James MULHOLLAND, Wheaton College

2. Michael BROWN, Aberdeen University

3. Eoin MAGENNIS, Eighteenth-Century Ireland Society

26. “Eighteenth-Century Reception Studies” – I Port Hardy

Chair: Marta KVANDE, Texas Tech University

1. Alise JAMESON, Ghent University, “The Influence of Gerard

Langbaine’s Seventeeth-Century Play Catalogues on Eighteenth-

Century Criticism and Authorship Ideals”

2. Diana SOLOMON, Simon Fraser University, “Sex and Solidarity:

Restoration Actresses and Female Audiences”

3. Jennifer BATT, University of Oxford, “The Digital Miscellanies Index

and the Reception of Eighteenth-Century Poetry”

4. Michael EDSON, University of Delaware, “From Rural Retreat to Grub

Street: The Audiences of Retirement Poetry”

29. “Bodies, Affect, Reading” Parksville

Chair: David A. BREWER, The Ohio State University

1. Amelia WORSLEY, Princeton University, “Lonely Readers in the Long

Eighteenth Century”

2. Amit YAHAV, University of Haifa, “Rhythm, Sympathy, and Reading

Out Loud”

3. Wendy LEE, Yale University, “A Case for Impassivity”

11:30-1pm, Thursday, March 17

38. “Scholarship and Digital Humanities, Part II: Authoritative

Sources” (Roundtable) Grand Ballroom BC

Chair: Christopher VILMAR, Salisbury State University

1. Katherine ELLISON, Illinois State University

2. Ben PAULEY, Eastern Connecticut State University

3. Adam ROUNCE, Manchester Metropolitan University

4. Brian GEIGER, University of California, Riverside

5. Lorna CLYMER, California State University, Bakersfield

2:30-4 Thursday, March 17

56. “Scholarship and Digital Humanities, Part III: Materials for

Research and Teaching” (Roundtable) Grand Ballroom BC

Chair: Bridget KEEGAN, Creighton University

1. Mark ALGEE-HEWITT, McGill University

2. Anna BATTIGELLI, State University of New York, Plattsburgh

3. Ingrid HORROCKS, Massey University

4. John O’BRIEN AND Brad PASANEK, University of Virginia

59. “The Private Library” Pavilion Ballroom D

Chair: Stephen H. GREGG, Bath Spa University

1. Laura AURICCHIO, Parsons the New School for Design, “Lafayette’s

Library and Masculine Self-Fashioning”

2. Nancy B. DUPREE, University of Alabama, “The Life and Death of a

Library: The Collection of John Joachim Zubly”

2. Meghan PARKER, Texas A&M University, “Private Library, Public


3. Mark TOWSEY, University of Liverpool, “‘The Talent Hid in a

Napkin’: Borrowing Private Books in Eighteenth-Century Scotland”

66. “Editing the Eighteenth Century for the Twenty-First Century

Classroom” (Roundtable) Junior Ballroom B

Chair: Evan DAVIS, Hampden-Sydney College

1. Joseph BARTOLOMEO, University of Massachusetts, Amherst

2. Linda BREE, Cambridge University Press

3. Anna LOTT, University of North Alabama

4. Marjorie MATHER, Broadview Press

5. Laura RUNGE, University of South Florida

9:45-11:15 a.m, Friday, March 18

102. “The Eighteenth Century in the Twenty-First: The Impact of the Digital Humanities” (Digital Humanities Caucus) (Roundtable)

Grand Ballroom BC

Chair: George H. WILLIAMS, University of South Carolina, Upstate

1. Katherine ELLISON, Illinois State University

2. Michael SIMEONE, University of Illinois, Urbana-Champaign

3. Elizabeth Franklin LEWIS, University of Mary Washington

4. Kelley ROWLEY, Cayuga Community College

11:30-1 p.m. Friday, March 18

130. “Writing and Print: Uses, Interactions, Cohabitation” – II

(Society for the History of Authorship, Reading, and Publishing,

SHARP) Junior Ballroom D

Chair: Eleanor SHEVLIN, West Chester University

1. Shannon L. REED, Cornell College, “The Enactment of Theory:

Literary Commonplace Books in the Eighteenth Century”

2. Miranda YAGGI, Indiana University, “‘A Method So Entirely New’:

Female Literati and Hybrid Forms of Eighteenth-Century Novel


3. Shirley TUNG, University of California, Los Angeles, “Manuscripts

‘Mangled and Falsify’d’: Lady Mary Wortley Montagu’s ‘1736.

Address’d T –‘ and The London Magazine”

4. A. Franklin PARKS, Frostburg State University, “Colonial

American Printers and the Transformation from Oral-Scribal to Print


132. The Eighteenth Century on Film Orca

(Northeast American Society for Eighteenth-Century Studies)

Chair: John H. O’NEILL, Hamilton College

1. Elizabeth KRAFT, University of Georgia, “The King on the Screen”

2. Natania MEEKER, University of Southern California, “Le Bonheur au

féminin: Passion and Illusion in Du Châtelet and Varda”

3. David RICHTER, Graduate Center, City University of New York,

“Writing Lives and Telling Stories: The Narrative Ethics of the

Jane Austen Biopics”

2:30-4 p.m., Friday, March 18

146. “New Media In the Eighteenth Century” (New Lights Forum:

Contemporary Perspectives on the Enlightenment) Port Alberni

Chair: Jennifer VANDERHEYDEN, Marquette University

1. Lisa MARUCA, Wayne State University, “From Body to Book: Media

Representations in Eighteenth-Century Education”

2. Caroline STONE, University of Florida, “Publick Occurences and the

Digital Divide: The Influence of Technological Borders on Emergent

Forms of Media”

3. George H. WILLAMS, University of South Carolina, Upstate,

“Creating Our Own Tools? Leadership and Independence in

Eighteenth-Century Digital Scholarship”

8-9:30 a.m., Saturday, March 19

156. “The Circulating Library and the Novel in the Long Eighteenth

Century” Orca

Chair: Hannah DOHERTY, Stanford University

1. Lesley GOODMAN, Harvard University, “Under the Sign of the

Minerva: A Case of Literary Branding”

2. Natalie PHILLIPS, Stanford University, “Richardson’s Clarissa and the

Circulating Library”

3. Elizabeth NEIMAN, University of Maine, “Novels Begetting Novels—

and Novelists: Reading authority in (and into) Minerva Press Formulas

9:45-11:15, Saturday, March 19

170. “Will Tomorrow’s University Be Able to Afford the Eighteenth

Century? If So, How and Why? (Roundtable) (New Lights Forum:

Contemporary Perspectives on the Enlightenment) Parksville

Chair: Julie Candler HAYES, University of Massachusetts, Amherst

1. Downing A. THOMAS, University of Iowa

2. Daniel BREWER, University of Minnesota

3. Melissa MOWRY, St. John’s University

4. Albert J. RIVERO, Marquette University

173. “Colloquy with Matt Cohen on The Networked Wilderness” (Roundtable) Port Alberni

Chair: Dennis MOORE, Florida State University

1. Birgit Brander RASMUSSEN, Yale University

2. Bryce TRAISTER, University of Western Ontario

3. Cristobal SILVA, Columbia University

4. Jeffrey GLOVER, Loyola University, Chicago

5. Matt COHEN, University of Texas at Austin

6. Sarah RIVETT, Princeton University

177. “Crowding-sourcing and Collaboration: Community-Based

Projects in Eighteenth-Century Studies” Grand Ballroom D

Chair: Bridget DRAXLER, University of Iowa

1. Margaret WYE, Rockhurst University, “The Challenge and

Exhilaration of Collaboration: From Post Grad to Undergrad, It’s All

Research, All the Time”

2. Victoria Marrs FLADUNG, Rockhurst University, “Undergraduate

Research: How I Learned to Love Irony in Jane Austen’s Mansfield


3. Laura MANDELL, Miami University, “Crowd-sourcing the Archive:”

Respondent: Elizabeth GOODHUE, University of California, Los Angeles

2-3:30 p.m., Saturday, March 19

181. Evaluating Digital Work: Projects, Programs and Peer Review”

(Digital Humanities Caucus) (Roundtable) Grand Ballroom BC

Chair: Lisa MARUCA, Wayne State University

1. Holly Faith NELSON, Trinity Western University

2. Bill BLAKE, University of Wisconsin, Madison

3. Allison MURI, University of Saskatchewan

4. Laura MCGRANE, Haverford College

5. Gaye ASHFORD, Dublin City University

6. Anne Marie HERRON, Dublin City University

184. New Approaches to Teaching the Great (and not-so-great) Texts of

the Eighteenth Century” (Roundtable) (Graduate Student Caucus)

Chair: Jarrod HURLBERT, Marquette University Junior Ballroom B

1. Christian BEDNAR, North Shore Community College

2. Ann CAMPBELL, Boise State University

3. Christopher NAGLE, Western Michigan University

4. Peggy THOMPSON, Agnes Scott College

5. Deborah WEISS, University of Alabama

193. “Marketing and Selling Books in Eighteenth-Century France: People, Places and Practices” Orca

Chair: Reed BENHAMOU, Indiana University

1. Thierry RIGOGNE, Fordham University, “Marketing Literature and

Selling Books in the Parisian Café, 1680-1789”

2. Marie-Claude FELTON, Ecole des Hautes Etudes en Sciences Sociales,

Paris and Université du Québec à Montréal, “Cutting out the

Middlemen: Self-Publishing Authors and their Autonomous

Commercial Endeavors in the Parisian Literary Market, 1750-1791”

3. Paul BENHAMOU, Purdue University, “Le Commerce de la lecture à

Lyon dans la seconde moitié du 18ème siècle: Le cas du libraire-

imprimeur Reguilliat”

Google Books Award: ESTC Receives Digital Humanities Grant

July 21, 2010

Posted on behalf of Brian Geiger, University of California, Riverside.

Brian reports:

I’m pleased to announce that Ben Pauley and I have received one of twelve inaugural Google Digital Humanities grants to match pre-1801 items in Google Books to the ESTC. The official announcement was made last week. You can read more about the grant at Inside HigherEd.

Our plan is to match as much as we can through computer matching, putting urls for Google Books in appropriate ESTC records and providing Google with ESTC ids and metadata. We don’t know for sure, but estimate that there will be between 100,000 and 200,000 ESTC-related items in Google Books. Based on matching that the Center for Bibliographical Studies and Research (CBSR) has done of records from electronic library catalogs, we should be able to computer match up to 50% of the Google records. This number could be lower than usual, however, given the truncated nature of much of the Google metadata.

The remaining 50% or so of the records we hope to put in a version of Ben’s Eighteenth-Century Book Tracker and make publicly accessible for users to help with the matching. For those of you teaching bibliography or bibliographically-minded courses next year, this could be a wonderful teaching tool, allowing your students to struggle with the complexities of early modern bibliography and learn first-hand its importance for understanding the history of the book.

We’ll update this blog about our progress with the Google Books metadata and hope to have a version of the Eighteenth-Century Book Tracker ready for use by the end of the fall or early spring.

An update on Eighteenth-Century Book Tracker

July 20, 2010

[Edit: fixed a couple of broken links—my apologies. -bp]

I wanted to let readers of this blog know about a couple of updates at Eighteenth-Century Book Tracker that I hope will make the site a valuable adjunct for those who look for early modern books at Google Books and the Internet Archive. These changes should also make it easier for users to contribute links to the site.

For several months, between about November, 2009 and March, 2010, visitors to the site wouldn’t have seen a whole lot happening. During that period, rather than adding new links to the site, I was re-tooling the site’s data model in order to make things more flexible and robust—essentially, I was recreating all of the site’s content along new lines. This was not fun, but I think the results are worth it. (more…)

Digital Humanities at AHA

January 12, 2010

In an earlier post we covered MLA panels devoted to digital humanities, electronic archives, and electronic tools. Thus, although the American Historical Association annual meeting has already recently concluded, we still thought it would be useful to review the sessions held at this convention. When available, I have included links to papers or abstracts.

Humanities in the Digital Age, Part 1: Humanities in the Digital Age, Part 1: Digital Poster Session
This session will provide participants with an overview of different digital tools and services and how historians are using them for research, teaching, and collaboration. After brief introductions to the various posters, participants would walk around the room spending time at the various stations, talking with the presenters and other participants. This will be followed in the afternoon by a hands-on workshop (session 73) where participants can learn more about how to use these specific tools. Co-sponsored by the National History Education Clearinghouse (NHEC):

  • Blogging, Jeremy Boggs, Center for History and New Media, George Mason University
  • Text Mining, Daniel J. Cohen, Center for History and New Media, George Mason University
  • Student Projects/Websites and Omeka, Jeffrey McClurken, University of Mary Washington
  • Zotero, Trevor Owens, Center for History and New Media, George Mason University
  • Teaching Tools, Kelly Schrum, Center for History and New Media, George Mason University
  • Web 2.0 – Flickr, YouTube/Video, Google Maps, Wikis, Jim Groom, University of Mary Washington
  • (more…)

    Google Book Settlement Revised

    November 14, 2009

    As we continue to discuss Robert Darnton’s excellent The Case for Books, it may be important to include a simultaneous thread on the most recent developments in the Google Book Settlement.

    Google’s revised version of the Settlement was submitted on Friday. The New York Times article covering this development can be found by clicking here. It includes a number of interesting links, including links to Google’s own posting regarding the revisions and those of The Open Book Alliance.

    The Digital Revolution and the Scholar: Darnton’s View

    November 10, 2009

    To continue the discussion begun by our consideration of Ken Auletta’s Googled, we move to another recent work. Robert Darnton, who has opted out of the Google Book Settlement for Harvard, has faith that we can do better in terms of providing digital access. His The Case for Books: Past, Present, and Future presents his vision and recommendations. As he asserts in a recent article for Publisher’s Weekly:

    Today, however, we have the means to make that utopia a reality. In many societies, despite enormous inequalities, ordinary people not only read but have access to a huge quantity of reading matter through the Internet. I would not minimize the digital divide, which separates the computerized world from the rest, nor would I underestimate the importance of traditional books. But the future is digital. And I believe that if we can resolve the current challenges facing books in ways that favor ordinary citizens, we can create a digital republic of letters. Much of my book is devoted to this premise and can be summarized in two words: digitize and democratize.

    Because versions of the chapters in Darnton’s The Case for Books have appeared elsewhere, those who do not have a copy of his book might find the following list of sources helpful. (The first two chapters are most recent).

    Chapter One comes from “Google & the Future of Books” that appeared in The New York Review of Books, (February 12, 2009).

    Chapter Two comes from “The Library in the New Age,” New York Review of Books, (June 12, 2008).

    Chapter Four comes from “Lost and Found in Cyberspace,” Chronicle of Higher Education ( March 12, 1999).

    Chapter Five comes from “The New Age of the Book,” New York Review of Books, (March 18, 1999).

    Chapter Eight comes from “The Great Book Massacre,” New York Review of Books, (April 26, 2001).

    Chapter Nine comes from “The Heresies of Bibliography,” New York Review of Books, (May 29, 2003).

    Chapter Ten comes from “Extraordinary Commonplaces,” New York Review of Books, (December 21, 2000).

    Chapter Eleven comes from “What Is the History of the Books? (widely reprinted), Daedalus (summer 1982): 65-83.

    Darnton has been interviewed by a number of sources about this book. Rebecca Rego Barry” “Google v. Gutenberg: Robert Darnton’s new book on old books and e-books” appears in Fine Books & Collecting.

    The Digital Revolution and the Scholar: Auletta’s View

    November 5, 2009

    Ken Auletta’s new book, Googled: The End of the World as We Know it, examines Google as the driving force behind the digital revolution, changing everything in its wake. In an effort to identify the consequences of Google on the scholarly world, we are using Auletta’s book as a point of departure.

    Readers are invited to contribute reactions to Auletta’s work.  Terry Gross’s recent interview with Auletta can be found by clicking here.

    Eighteenth-Century Book Tracker

    August 12, 2009

    Anna Battigelli and Eleanor Shevlin invited me to write a bit about the Eighteenth-Century Book Tracker project that Laura Mandell linked to last week, and I’m happy to do so.

    This is a project I began thinking about around a year ago, and to explain some of its premises, I’d best say a bit about the circumstances that gave rise to it. I teach at a mid-sized, primarily undergraduate public university that hasn’t purchased access to ECCO, EEBO, et. al. and, realistically speaking, isn’t ever going to purchase access to them at their current prices. I’m really fortunate to be able to use ECCO and other resources at the University of Connecticut, just a few miles up the road, so my own research isn’t unduly hampered by not having them at my home institution. (What hampers my research is my 4/4 teaching load, but that’s another matter…) I can’t really take advantage of ECCO in my teaching, though, which led me to start exploring resources like Google Books and the Internet Archive. While you can’t beat the price, those sites—and, let’s recall, they’re functionally the only ones that people without institutional access to the big databases can leverage—leave a lot to be desired.

    There’s been a lot of good discussion here about the nature of Google Books and the Internet Archive—what they are and aren’t good for, how best to think about them, whether as catalogues/finding aids or as searchable textbases. I hope it won’t seem too contrary of me, then, to say that, at present, they aren’t especially good at being either of those things.


    Reading with Machines

    August 11, 2009

    A week or two ago, we had an interesting discussion here on EMOB about the uses and limits of computers as aids to literary scholarship. Below are some of my own summary thoughts on the subject, along with links to related work that I’ve found useful or thought-provoking.

    The first point to mention is that the things computers are good at are very different from the things humans are good at. The worthwhile work in digital humanities (“DH” for short, a synonym for computationally assisted humanities research) keeps this fact in mind. Computers are useful for doing quickly certain basic (that is, boring) tasks that humans do slowly. They’re really good at counting, for instance. But sometimes, happily, these kinds of quantitative improvements in speed produce qualitative changes in the kinds of questions we can pose about the objects that interest us. So we literary scholars don’t want to ask computers to do our close reading for us. We want them to help us work differently by expanding what we can read (or at least interpret) and how we can read it. And we want to keep in mind that reading itself is just one (extraordinarily useful) analytical technique when it comes to understanding literary or social-aesthetic objects.

    There are two main classes of literary problems that might immediately benefit from computational help. In the first, you’re looking for fresh insights into texts you already know (presumably because you’ve read them closely). In the second, you’d like to be able to say something about a large collection of texts you haven’t read (and probably can’t read, even in principle, because there are too many of them; think of the set of all novels written in English). In both cases, it would almost certainly be useful to classify or group the texts together according to various criteria, a process that is in fact at the heart of much computationally assisted literary work.

    In the first case, what you’re looking for are new ways to connect or distinguish known texts. Cluster analysis is one way to do this. You take a group of texts (Shakespeare’s plays, for instance), feed them through an algorithm that assesses their similarity or difference according to a set of known features or metrics (sentence length, character or lemma n-gram frequency, part of speech frequency, keyword frequency, etc.—the specific metrics need to be worked out by a combination of so-called “domain knowledge” and trial and error), and produce a set of clusters that rank the relative similarity of each work to the others. Typical output looks something like this figure from Matthew Jockers’ blog (click the image to see it full size in its original context):

    Three dramatic genres in Shakespeare

    Read this diagram from the top down; the lower the branch point between two items or groups, the more closely related they are.

    This may or may not be interesting. Note in particular that the cluster labels are supplied by the user, outside the computational process. In other words, the algorithm doesn’t know what the clusters mean, nor what the clustered works have in common. Still, why does Othello cluster with the comedies rather than the tragedies (or the histories, to which the tragedies are more closely related than the comedies)? The clustering process doesn’t answer that question, but I might never have thought to ask it if I hadn’t seen these results. Maybe I won’t have anything insightful to say in answer to it, but then that’s true of any other question I might ask, and at least now I have a new potential line of inquiry (which is perhaps no mean thing when it comes to Shakespeare).

    (As an aside, the extent to which I’m likely to explain the categorization of Othello as a simple error instead of as something that requires further thought and attention will depend on how well I think the clustering process works overall, which in turn will depend to at least some extent on how well it reproduces my existing expectations about generic groupings in Shakespeare. The most interesting case, probably, is the one in which almost all of my expectations are met and confirmed—thereby giving me faith in the accuracy of the overall clustering—but a small number of unexpected results remain, particularly if the anomalous results square in some way with my previously undeveloped intuitions.)

    Even more compelling to me, however, is the application of these and related techniques to bodies of text that would otherwise go simply unread and unanalyzed. If you’re working on any kind of large-scale literary-historical problems, you come up very quickly against the limits of your own reading capacity; you just can’t read most of the books written in any given period, much less over the course of centuries. And the problem only gets worse as you move forward in time, both because there’s more history to master and because authors keep churning out new material at ever-increasing rates. But if you can’t read it all, and if (as I said above) you can’t expect a computer to read it for you, what can you possibly do with all this stuff that currently, for your research purposes, may as well not exist?

    Well, you can try to extract data of some kind from it, then group and sort and classify it. This might do a few different things for you:

    1. It might allow you to test, support, or refine your large-scale claims about developments in literary and social history. If you think that allegory has changed in important and specific ways over the last three centuries, you might be able to test that hypothesis across a large portion of the period’s literary output. You’d do that by training an algorithm on a smallish set of known allegorical and non-allegorical works, then setting it loose on a large collection of novels. (This process is known as supervised classification or supervised learning, in contrast to the un- or semi-supervised clustering described briefly above. For more details, see the Jain article linked at the end of this post.). The algorithm will classify each work in the large collection according to its degree of “allegoricalness” based on the generally low-level differences gleaned from the training set. At that point, it’s up to you, the researcher, to make sense of the results. Are the fluctuations in allegorical occurrence important? How does the genre vary by date, national origin, gender, etc.? Why does it do so? In any case, what’s most exciting to me is the fact that you’re now in position to say something about these works, even if you won’t have particular insight into any one of them. Collectively, at least, you’ve retrieved them from irrelevance and opened up a new avenue for research.
    2. The same process might also draw your attention to a particular work or set of works that you’d otherwise not have known about or thought to study. If books by a single author or those written during a few years in the early nineteenth century score off the charts in allegoricalness, it might be worth your while to read them closely and to make them the objects of more conventional literary scholarship. Again, the idea is that this is something you’d have missed completely in the absence of computational methods.
    3. Finally, you might end up doing something like the Shakespearean clustering case above; maybe a book you do know and have always considered non-allegorical is ranked highly allegorical by the computer. Now, you’re probably right and the computer’s probably wrong about that specific book, but it might be interesting to try to figure out what it is about the book that produces the error, and to consider whether or not that fact is relevant to your interpretation of the text.

    One note of particular interest to those who care deeply about bibliography. In an earlier post about Google Book Search (a service tellingly renamed from the original Google Print), there was some debate about whether GBS is a catalog or a finding aid, and whether or not full-text search takes the place of human-supplied metadata. I think it’s obvious that both search and metadata are immensely useful and that neither can replace the other. One thing that text mining and classification might help with, though, is supplying metadata where none currently exists. Computationally derived subject headings almost certainly wouldn’t be as good as human-supplied ones, but they might be better than nothing if you have a mess of older records or very lightly curated holdings (as is true of much of the Internet Archive and GBS alike, for instance).

    Finally, some links to useful and/or interesting material: