Archive for the ‘ESTC’ Category

ASECS Session: “ECCO, EEBO, and the Burney Collection: Some “Noisy Feedback” (roundtable)

March 13, 2010

Thursday, March 18,  9:45 – 11:15 a.m.

“ECCO, EEBO, and the Burney Collection: Some ‘Noisy Feedback’(Roundtable)    Alvarado E

Chair:    Anna BATTIGELLI, State University of New York, Plattsburgh

1.    Sayre GREENFIELD, University of Pittsburgh, Greensburg

2.    Stephen KARIAN, Marquette University

3.    James E. MAY, Pennsylvania State University, DuBois

4.    Eleanor F. SHEVLIN, West Chester University

5.    Michael F. SUAREZ, S.J., Rare Book School, University of Virginia

RESPONDENTS: ScottDAWSON,Gale/Cengage; Brian GEIGER, ESTC: Jo-Anne HOGAN, Proquest

Advertisements

Collaboration, Costs, and Digital Resources

January 30, 2010

On February 19 and 20 Yale will host a graduate student symposium, The Past’s Digital Presence Conference: Database, Archive and Knowledge Work in the Humanities. A quick survey of the conference program and available abstracts indicate several topics that dovetail with issues or subjects that have engaged emob. Jessica Weare’s paper, “The Dark Tide: Digital Preservation, Interpretive Loss, and the Google Books Project”, for instance, examines the discarding of material evidence in the process of digitizing, Vera Brittain’s The Dark Tide. Similarly, Scott Spillman and Julia Mansfield’s presentation, “Mapping Eighteenth-Century Intellectual Networks”, discusses their work on Benjamin Franklin’s letters and their relationship within the Republic of Letters. The conference’s purpose also addresses many of the questions we have been posing on this blog:

■ How is digital technology changing methods of scholarly research with pre-digital sources in the humanities?
■ If the “medium is the message,” then how does the message change when primary sources are translated into digital media?
■ What kinds of new research opportunities do databases unlock and what do they make obsolete?
■ What is the future of the rare book and manuscript library and its use?
■ What biases are inherent in the widespread use of digitized material? How can we correct for them?
■ Amidst numerous benefits in accessibility, cost, and convenience, what concerns have been overlooked?

Peter Stallybrass is offering the keynote, and Jacqueline Goldsby will be the colloquium speaker, while Willard McCartney, Rolena Adorno, and others will appear on the closing roundtable. Such a lineup points to the range of perspectives represented. The conference is free to all affiliated with a university.

Among the places this conference has been announced is the JISC Digitisation News section of the UK Digitisation Programme website, and its announcement emphasizes the participation of students “from around the globe.”

Collaboration as it occurs across boundaries is the implicit topic of this posting, and I wish to use reports from the JISC website both as a springboard and as a contrast in the discussing the topic.

A 2008-2009 JISC report, Enriching Digital Resources 2008-2009, Enriching Digital Content program—a strand of the JISC Online Content Program—features a podcast with Ben Showers. Because of the national nature of JISC, the program described offers a unified, coherent approach to advancing digital resources for its higher institutions of education; it represents a collaborative agenda. In this podcast Showers explains the purpose of the program: Rather than fund the creation of new resources, the program invested £1.8 million to enhance and enrich existing digital content while also developing a system for universities and colleges to vet and recognize this work. He then turns to explaining the following four key benefits of this program:
• “unlocking the hidden—making things that are hard to access easy” to obtain and preserve. To illustrate, he uses CORRAL (UK Colonial Registers and Royal Navy Logbooks) project as an example of opening up primary data to make it not only much more available but also to preserve it.
• enhancing experiences of students. Here Showers exemplifies the Enlightening Science project at Sussex that offers students opportunities to watch video re-enactments of Newton’s experiments and read original texts by Newton and others.
• speeding up research—once a document has been digitized, there is no need to repeat the process. The document will now be available for all other researchers to use.
• widening participation—engaging broader audiences including not only faculty and students within Britain’s educational community but also participants globally.

Turning to the new goals for the 2009-2011 program cycle, Showers notes an emphasis on the “clustering” of content, that is bringing various projects together and establishing, when appropriate, links among them. Another focus is further building skills and strategies within institutions to deliver digital content effectively. Finally, he mentions the strengthening of transatlantic partnerships, and here the US National Endowment for the Humanities (NEH) is given as an example. Of course, there is a long history of scholarly collaboration between the NEH and British institutions—perhaps most notably the English Short Title Catalogue (ESTC).

Indeed, through collaborative digital grants offered by JISC and NEH several transatlantic projects are underway or near completion, including the Shakespeare Quartos Archive, a collaborative effort involving Oxford University and the Folger Library, and the St Kitts-Nevis Digital Archaeology Initiative, undertaken by Southampton University and the Thomas Jefferson Foundation, Charlottesville, VA, to advance scholarship on slavery. There are several others as well.

Both the goals and benefits detailed by Showers are ones that would attract the support of diverse parties, and they do parallel many arguments being made on this side of the Atlantic for such work, including ones advanced by the NEH. Moreover, this and other JISC reports suggest that JISC has also helped broker mutually beneficial relationships between British universities and commercial vendors such as Cengage-Gale and ProQuest. Yet another JISC report, The Value of Money, offers arguments that we need to be making and also points the obstacles and divides affecting various types of collaboration in the United States.

After offering the following figures on the return of money invested in the JISC,

• For each £1 spent by JISC on the provision of e-resources, the return to the community in value of time saved in information gathering is at least £18.

• For every £1 of the JISC services budget, the education and research community receives £9 of demonstrable value.

• For every £1 JISC spent on securing national agreements for e-resources, the saving to the community was more than £26.

the report summary offers the following remarks:

These are the figures revealed by a recently-published Value for Money report on JISC services. Although many countries have centrally provided research and education networks, and some have provided supplementary services, no other country has a comparable single body providing an integrated range of network services, content services, advice, support and development programmes.

The cost-effectiveness of JISC is again highlighted in two sidebars:

These figures suggest that for every £1 JISC spent on securing national agreements for e-resources, the saving to the community was more than £26
and
The added value, equivalent to more than £156m per year, suggests the community is gaining 1.4 million person/days, by using e-resources rather than paper-based information.

The end of the summary further reinforces why investments in JISC benefit the UK as a whole:

The value of JISC activities extends beyond the benefits identified here. Education and research are high-value commodities that play an important role in the UK economy and underpin the UK’s global economic position.

The JISC’s “Value of Money” report contains the types of arguments and data that we in the US need to be making. While our system of higher education does not operate under the centralized system that characterizes that of the UK, the push for more transparent reporting on and assessment of what our various universities and colleges are delivering perhaps provides an opportunity for new forms of collaboration. Through national scholarly societies, the NEH, Mellon Foundation, ALA, and more, we need to supply some “noisy feedback” from a dollars-and-cents/sense perspective about what investing in digital resources means not just for our institutions of higher learning but also for our society.

CFP 2010: Digital Archives & the Field of Production

December 22, 2009

The following announcement appeared on the SHARP-L list and may interest readers of emob:

APPOSITIONS: Studies in Renaissance / Early Modern Literature and Culture
http://appositions.blogspot.com/

Call for Papers: APPOSITIONS: Studies in Renaissance / Early Modern Literature and Culture seeks new work addressing the theme of digital archives. How and why does electronic access to archival materials reconfigure the teaching and study of literary texts, related cultural documents, and methodologies for disciplinary or interdisciplinary research and interpretation? What are the benefits and/or limitations of such new media? What are the politics of the digital archive, or of electronic special collections? What is the significance of the original work—or of authorship, or scholarship—in the electronic age? How and why does the digitization of archival documents either celebrate or challenge the status of manuscripts, pamphlets, printed books, and the literary canon? Within that capacious scope, a variety of topics will be engaged.

APPOSITIONS is an electronic, international, annual conference for studies in Renaissance & early modern literature and culture hosted by APPOSITIONS: Studies in Renaissance / Early Modern Literature and Culture, ISSN: 1946-1992, http://appositions.blogspot.com/.

Abstracts (500-words): December 31, 2009.
E-Conference: February-March, 2010.

Electronic Submissions: Send submissions to showard@du.edu attached as a single .doc, .rtf, or .txt file. Visuals should be attached individually as .jpg, .gif, or .bmp files. Please include the words “Appositions Submission” in the subject line of your message.

Summary of EC/ASECS Roundtable: Bibliography, the ESTC, and 18th-Century Electronic Databases

October 24, 2009

Bibliography, the ESTC, and 18th-Century Electronic Databases:  A Roundtable

Chair: Eleanor F. Shevlin (West Chester University)   Participants: James E. May (Penn State University—DuBois), James Tierney (University of Missouri—St. Louis), David Vander Meulen (University of Virginia), Benjamin Pauley (Eastern Connecticut State University), Brian Geiger (ESTC, University of California, Riverside), and Scott Dawson (Gale/Cengage).

The following offers a summary of the roundtable that took place, Saturday, October 10, 2009, at the EC/ASECS 2009 conference hosted by Lehigh University and held at Bethlehem, Pennsylvania, October 8-11, 2009.

 Jim May opened the roundtable, and his remarks highlighted and extended the discussion he offered in his essay, “Some Problems in ECCO (and ESTC),” in The Eighteenth-Century Intelligencer, 23.1 (Jan. 2009), the article that inspired this session and Anna Battigelli’s forthcoming roundtable at ASECS (March 18th, 9:45 am—11:15 am). Key issues Jim raised included the need to correct missing images, to address the “disappearance” of letters originally printed in red ink on title pages, and to bring the ESTC up to date. In addition, he noted that ECCO’s electronic index is not always representative of what is actually there digitally. Work is also needed on providing or revising information about subscription lists, textual history, and attributions in ESTC. While noting that he had already addressed problems with Burney in his The Eighteenth-Century Intelligencer article, 23.2 (May 2009) and that Jim Tierney would be discussing this tool next, Jim commented on the usefulness of Burney, particularly to those working on the history of a publication.

Turning to the Burney collection, Jim Tierney drew attention to the potentially confusing name for this electronic collection because it is not by any means restricted to newspapers. Instead, it includes a good number of periodicals as well. Specifically, the collection consists of 237 newspapers and 161 periodicals, and, furthermore, some of the titles included are neither newspapers nor periodicals. That the Burney digitized collection follows the Anglo-American cataloguing procedure of creating a new entry every time a newspaper undergoes a title change results in the illusion of more titles than actually exist as well as confusion about the history of a given newspaper. Jim also provided a detailed handout (posted here as a page) listing the digitized periodicals (note: not newspapers) in Burney. The handout includes notes about missing issues, other locations where titles in Burney can be found, and a tentative list of Burney titles duplicated by other digitization projects. The two overarching points Jim made was the failure to have scholars involved in the planning of Burney and other digitization projects and the need for far greater collaboration among the creators/purveyors of these databases, librarians, and scholars. That given titles in Burney often include only a few issues when other issues were available elsewhere and, if digitized, would have approached a more complete run, exemplify the need for far better coordination and collaboration.

While David Vander Meulen serves on the ESTC board, his remarks for the roundtable were offered in his role as a researcher and user of these tools. He began by noting that ESTC is an evolving tool—a work in progress—and that ECCO follows ESTC.  Moreover, even as it progresses, the ESTC is still “functional and valuable” even though it is incomplete. Nonetheless, “any addition to ESTC will change the context.” An important development occurred in 2006 when the British Library initiated free access to this tool. As for problems, the ESTC had made the decision to truncate titles and places. Yet ECCO generally offers the full titles, while expanded locations can occasionally be found by going to public library catalogues. To improve these resources, David explained, we need to have an easier way to convey corrections to the British Library or University of California Riverside (the North American home of the ESTC) and, equally important, an ongoing staff to process editorial changes and comments. In discussing this need for a means of processing updates, David also drew attention to whether the uncontrolled notes field should be visible. Unfortunately agencies that have funded the ESTC, as he explained in his closing remarks, have decided the project is complete.  Obviously, given ESTC’s status as a work-in-progress, such a decision presents additional problems to continued updating and correcting.

 Ben Pauley spoke next about a project he has initiated. He began by noting the lack of access that many institutions (and thus their scholars and students) have to paid databases such as EEBO and ECCO. Both Internet Archives and Google Books, however, have a number of eighteenth-century books in their freely accessible databases. Yet it is typically very hard to identify properly what text one has accessed. Viewing these freely available texts as an opportunity, Ben established The Eighteenth-Century Book Tracker, a project in which he is supplying the bibliographic data so sorely lacking in eighteenth-century texts found in Google Books. Doing so has compelled him to become a textual scholar or an “accidental bibliographer.” Thus far, he has recorded about 150 copies not appearing in ESTC. At present, the project features 480 texts and 4 periodicals. Ben has been asked to write an article on the Eighteenth-Century Book Tracker for The Eighteenth-Century Intelligencer that will detail much more about his undertaking.

Speaking as the Associate Director and Resident Manager of the Center (University of California Riverside), the North American home of the ESTC, Brian Geiger explained that the British Library’s ESTC role has focused on cataloguing its own collection and that the Univ. of California Riverside  has handled everything else. In addition to reiterating points about the problem with truncated titles, he also discussed the lack of subject headings as a shortcoming. Turning to the digital surrogates of early modern imprints, he explained that the ECCO and Adam Matthews collections are based on ESTC, but EEBO is not. Next Brian addressed the need to foster better communication between ESTC and scholars. While the channels of communication between ESTC and librarians have remained strong, that has not been the case with scholars. Like Ben, Brian will also be writing an article on the ESTC for The Eighteenth-Century Intelligencer.

 Scott Dawson from Gale-Cengage concluded the presentations by roundtable panelists. He first supplied an historical overview of ECCO and Burney. In 1982 Research Publications began to microfilm the “Eighteenth Century” microform collection. By 2002 twenty-six million pages of eighteenth-century titles had been filmed. This microfilm collection is the basis for ECCO, but using the ESTC in conjunction with the microfilm has been overall a real plus for the project.  ECCO II, released at the start of this year, features 50,000 additional titles. By mid 2010 ECCO II, representing holdings from fifteen libraries, will be completed (titles from the Harry Ransom Center are still being prepared). ECCO and ECCO II, combined, will have made 185,000 eighteenth-century titles available to subscribers. As for the digitization of Burney, that project was handled by the British Library and not Gale-Cengage. Scott also addressed some of the problems that can and cannot be corrected. When pages are blurred, for instance, the microfilm plays a key role in what can be done. If the microfilm is clear, then the page is re-filmed. Yet if the problem occurred because the page is blurred in the microfilm, then, from the perspective of Gale, nothing can be done. When duplications of a title are discovered, however, the duplications can be deleted. 

After all six panelists had offered opening statements, the discussion was opened to the audience’s questions and comments. The point perhaps most stressed in the discussion with the audience was a need for far greater involvement by scholars in the creation and improvement of digital resources. In terms of updating or correcting resources, questions arose about how this might be done and what types of controls are needed. In subsequent discussions, the creation of advisory boards and (or) the involvement of a committee representing ASECS arose as possible avenues for communicating and addressing the scholar’s perspective more effectively. The establishment of an advisory board and/or ties with ASECS could play a vital role in future projects, and members of a board or ASECS committee could also devise potential solutions to some of the shortcomings with existing tools.  The resurrection of Factotum, the now defunct ESTC news publication of the British Library (ceased with issue no. 40 in 1995), or the initiation of a similar publication would be a way of establishing regular, ongoing communication with a broader base of scholars. (For those interested in the content of previous issues, see the index for Factotum.) Of course, an obstacle here is staffing and funding. Questions also arose about plans to make Burney more complete by digitizing issues not included for a particular newspaper or periodical title but available elsewhere. Yet that this digitization project had been undertaken by the British Library (see final report) and not Gale complicates the issue. Also, when asked about any plans for an ECCO III, Scott explained that the creation of ECCO II caused surprise among many libraries that had purchased ECCO because they believed that ECCO was complete at the time. When ECCO II was introduced for purchase, libraries were promised that there would not be any additional forms of ECCO.  (Depending on the discovery of additional eighteenth-century titles, however, I see no reason that another collection could not be pursued; if enough material for another collection becomes available, then scholars need to insert and assert themselves in conversations with vendors and librarians and make the need and value of a third collection known.)

Another very real, pressing concern was the large number of scholars who do not have access to these databases and for whom their institutions are not likely to be able to afford these resources even in the future. The point was raised that all universities in the U.K. have access to ECCO and ECCO II for an annual hosting fee through the auspices of the Joint Information Systems Committee (JISC), “established by the UK further and higher education funding councils in 2006 to negotiate with publishers and owners of digital content.” Because the situation differs greatly in the U.S.—we have no higher education government council overseeing all our universities—we do not have such a prospect here. While Ben Pauley’s Eighteenth-Century Book Tracker promises to bring some order to the current anarchy that characterizes freely available eighteenth-century texts, his valuable project can’t and won’t solve the inequity of access in the United States.

Eighteenth-Century Book Tracker

August 12, 2009

Anna Battigelli and Eleanor Shevlin invited me to write a bit about the Eighteenth-Century Book Tracker project that Laura Mandell linked to last week, and I’m happy to do so.

This is a project I began thinking about around a year ago, and to explain some of its premises, I’d best say a bit about the circumstances that gave rise to it. I teach at a mid-sized, primarily undergraduate public university that hasn’t purchased access to ECCO, EEBO, et. al. and, realistically speaking, isn’t ever going to purchase access to them at their current prices. I’m really fortunate to be able to use ECCO and other resources at the University of Connecticut, just a few miles up the road, so my own research isn’t unduly hampered by not having them at my home institution. (What hampers my research is my 4/4 teaching load, but that’s another matter…) I can’t really take advantage of ECCO in my teaching, though, which led me to start exploring resources like Google Books and the Internet Archive. While you can’t beat the price, those sites—and, let’s recall, they’re functionally the only ones that people without institutional access to the big databases can leverage—leave a lot to be desired.

There’s been a lot of good discussion here about the nature of Google Books and the Internet Archive—what they are and aren’t good for, how best to think about them, whether as catalogues/finding aids or as searchable textbases. I hope it won’t seem too contrary of me, then, to say that, at present, they aren’t especially good at being either of those things.

(more…)

18thConnect

August 7, 2009

Hello to the Early Modern Online Bibliography blog: your discussions here are amazing, and rich with references.

Robert Markley at the University of Illinois and I started 18thConnect — we are co-directors — as a subsidiary organization to NINES (http://www.nines.org) which is incredibly supportive, both financially and in other ways as well.  Basically, 18thConnect is an organization that will peer-review digital resources created by 18th-century scholars and then aggregate those resources along with commerical resources.

What does that mean?  When you come to the 18thConnect home page, you will be able to search for digital resources among free scholarly resources available on the web that have been judged high quality through peer review, AND commercial catalogs:  ECCO, Adam Matthew’s Eighteenth-Century Journals Portal, JSTOR, ProjectMuse, etc.  Our finding aid will deliver links to these resources — 18thConnect won’t house them in any way — and then, when you click on a link to an edition of Clarissa, say, proffered by ECCO, if your library subscribes to it and you are logged in at work, you will be sent directly to the resource.

Here is the news for those of you who already know about this initiative: at our summer meeting, July 15, in Dublin, Ireland, at the Royal Irish Academy, Gale consented to give us their page images.  We will attempt to machine-read them better, using our own home-made OCR program, in order to produce better plain text files, something closer to the keyed texts produced by the ECCO TCP.  Gale will allow us to index the texts that we produce to allow keyword searching on ECCO texts EVEN FOR THOSE PEOPLE WHO DON’T OWN the ECCO catalog.  In other words, you’ll be able to find the bibliographic data of the texts containing the keywords for which you search: if your library subscribes to ECCO, you can get the text directly, but if not, at least you now know which texts you’ll have to find through some other means (microfilm, interlibrary loan, visit to special collections).

We are now negotiating with the British Library and ESTC to get that catalog in as well.  The Digital Bibliography for English Literature (formerly the NCBEL) will be in soon.  We don’t yet  have the 18thConnect finding aid up and running: once we have the Gale (ECCO), Adam Matthew (18th-c Journals Portal), DBEL, ESTC data ingested and running smoothly, we will launch: we hope, June 2010.

If you would like to contribute ideas to how this organization should work, you may wish to first take a look at online videos about NINES and 18thConnect available at:

http://unixgen.muohio.edu/~poetess/NINES

and

http://unixgen.muohio.edu/~poetess/NINES/18thConnect.html

(our temporary home)

The NINES interface has changed since I made these videos, but the principles of its operation have not.

Please contribute ideas here, as I will check frequently, but also feel free to email me: mandellc@muohio.edu

Collaborative Readings #2: James May’s “Some Problems in ECCO (and ESTC)”

July 28, 2009

Anyone working even briefly with archives learns immediately that cataloguing is an art, not a science, and that the successful use of archives demands familiarity with its cataloguers’ idiosyncrasies.  In the first part of his meticulous “Some Problems in ECCO (and ESTC),” James May provides a hard look at cataloguing problems in both ECCO and ESTC.   May’s article has inspired two forthcoming roundtable discussions at EC/ASECS and ASECS and this blog, so it makes sense to provide at least a cursory review of his arguments here.  For May, bibliographical problems limit ECCO “as a set of digital facsimiles” (20).  Below is a list of the topics he covers.

Holdings:  Identifying the copy digitized is problematic because ECCO lists only the library holding the source copy, not the shelfmark. 

  • If the library holds more than one copy of the digitized text, “readers can’t know what is digitized unless its identifiable from MS annotations on the copy digitized” (21). 
  • If the source library’s cataloguing is ambiguous, duplicate entries for the same copy can result.  Among the examples citied is the case of the National Library of Scotland, which provides different shelfmarks for each volume of Edward Young’s 2-volume Poetical Works.  ESTC and ECCO thus list two copies of Young’s Poetical Works in that library, though there is only one (2-vol.) copy.
  • There are “editions in the ESTC (some reproduced on ECCO) that are not separate editions but only reissues of earlier editions” (23).

Attribution errors:

  • Examples include the false attribution of An Account of the Two Brothers, Perseus and Demetrius, . . . Collected from the Grecian History, written by the author of Busiris, . . . the Universal Passion, Satires & c. to Edward Young.  Additionally, both ESTC and ECCO “fail to list Young as the author of A Sea-Piece (Dodsley, 1755).

False claims regarding publishers and places of publication, and dates:

  • Faulty information taken from title pages is often absorbed uncritically into the ESTC and ECCO.  For May, “one problem with the ESTC and ECCO is that nobody surveys and edits its results” (25).  He suggests that “there ought to be a way for scholars to post notes tagged to ESTC and ECCO entries for other scholars to read—a suggestion Rob Hume made . . . two years ago” in The Eighteenth-Century Intelligencer (n.s. 21.1.16).

Format errors: The format of smaller books in particular is often incorrectly listed.  ECCO’s entry for The Works of Dr. Jonathan Swift, vols. 15-17 (1765), incorrectly calls those volumes 12mos rather than 18mos.

Incomplete or misrepresented works: “ECCO offers 16 of the 17 volumes of The Works of Johnathan Swift,. D.D: D.S.P.D. With Notes . . . By J. Hawksworth (Dublin: Williams, 1767-1768)” (25).  Sometimes frontispieces are missing.

Legibility of ECCO’s digital images: 

  • The images lack crispness
  • They  “fail to reproduce red-lettering on title-pages” (26).
  • Some pages are unreadable or incomplete.  See Vol. 3 of Smollett’s Continuation of the Complete History of England, 1762 (pages 167-68, 258, 321, or 328). 
  • Footnotes and marginalia are sometimes obscured.
  • Stains, and gutter loss are problems, with the latter obscuring “the third on all versos between pp. 14 and 24 of Swift’s A Tale of a Tub 1711: ESTC N136369, T49839, and N13640.
  • “Images are sometimes distorted relative to their proportions in height and width” (26). 
  • Later editions are less frequently digitized, though they are often rarer and thus arguably in greater need of digitized preservation than earlier editions.  Many of these subsequent editions were published in Ireland or Scotland, and thus ECCO is correspondingly weaker for a study of Scottish and Irish books and booktrade. 
  • First and revised editions are sometimes neglected.  As May argues, “roughly half of the pre-1775 editions of Young are to be found in ECCO” (29).  Only five of the nine editions of The Force of Religion appear in ECCO. 

Failed Searches: 

  • As May points out, “one must know to exclude ‘not’ from a title search” (27)
  • Sometimes searches for titles and dates fail if the work has not been tagged with the proper date
  • ECCO’s searches miss “a certain percentage of words” (27).

Selectivity:  Like Adam Matthews, which digitized periodicals, ECCO did not use a team of scholars in the process of selecting what should and should not be digitized.  May argues that “more scholarly rigor was no doubt needed when the filming by Gale and its predecessors was done to decide which copy should be filmed” (28). 

May concludes that “scholars need to provide a little noisy feedback to corporate ventures like ECCO if future projects are to benefit from their expertise” (29).

When is a Book Not a Book?: “Pseudodoxia Bibliographica”

July 27, 2009

The following assertion from the Monk Project’s description (and quoted by Anna in her comments about this tool):

the scholarly use of digital texts must progress beyond treating them as book surrogates and move towards the exploration of the potential that emerges when you put many texts in a single environment that allows a variety of analytical routines to be executed across some or all of them

identifies an issue that has interested me for a while now and is behind my embryonic formulations of the differences between digital database collections that act as delivery systems (JStor, Project Muse, etc) and those that proffer other functions such as serving as finding aids. The tendency to see digitized works such as those found in Google Books (in its present incarnation) as surrogates for physical books has frequently resulted in users’ frustrations and disappointment in using these resources. This tendency led me to title a paper I gave at last year’s MLA “When is a Book Not a Book?: Using Google Book Search.” Thus, when collecting additional material in preparation for the EC/ASECS and ASECS sessions, I was understandably drawn to an article by Hugh Amory entitled, “Pseudodoxia Bibliographica, or When is a Book Not a Book? When It’s a Record” (The Scholar & the Database: Papers Presented on 4 November 1999 at the CERL Conference Hosted by the Royal Library, Brussels, 2 [2001]: 1-14).

Amory’s article is concerned with the distortions and misconceptions that can result when historians treat an imprint catalogue’s entries as books or titles. Amory uses the term “imprint catalogues” to refer to ESTC (incorporating Pollard and Redgrave and Wing) and the machine-readable form of Evans reshaped for the North American Imprints Program (NAIP) (2) and distinguishes these research tools from the original Evans and from European bibliographies. While those interested should read the article in its entirety, I offer the following extracts that I found especially noteworthy or interesting:

“[O]ur bibliographies do not form a coherent series, employing different measures and various categories of the book” (1).

“”Indeed, the term ‘imprint’ itself is peculiarly English in its ambiguous complexity. It comprehends both a publisher’s imprint or marque d’éditeur and a printer’s imprint or achevé d’imprimer, as well as the editions in which these imprints occur—i.e. an imprimé—or even fail to occur. Only in English, I believe, is it possible for an imprint to have no imprint” (2).

“…any systematic, comprehensive access to places of publication is neglected…Unlike current national bibliographies, too, imprint bibliographies regularly include false and fictitious imprints” (3).

“Peculiar too to Anglophone bibliography is the inclusion of colonial and postcolonial printing in the retrospective national bibliographies of the mother country” (3).

These imprint catalogues

were never designed to answer the general questions posed by book history — to calibrate the relative size of metropolitan and colonial printing, for example, of religious and secular production, or the rise of the novel. The scope of imprint bibliographies is retroactive, imposing territorial and cultural inclusions and exclusions that were alien to their periods. Indeed, even the cataloguing of a database is retroactive, defined by the nature of the question. The numbers that pour forth in such profusion represent a certain number of ‘hits’, not entries, and the fuller the cataloguing, the higher the number of ‘hits’. (4)

“…it may be unfair to demand that imprint catalogues ‘represent’ anything, even imprints, for whose history they provide no more raw material. Nor are they really designed for the production of statistics on literary or intellectual history, where, especially in the form of union catalogues, they serve rather as inventories” (7).

“[ESTC] is neither English, Short-Title, nor a Catalogue, since the ‘cataloguing’ is only a response shaped by the system at the user’s request. One of its most useful features, keyword searching, is precisely an index, whose accuracy and exhaustiveness depend on the illogical whims of language” (8).

“The very accessibility of these catalogues distorts their numbers, and the exclusion or cataloguing of serials makes them even less representative of ‘the amount of printing performed’ than Evans” (10).

“Nor is there any agreement on where a book ends and a pamphlet begins; as the Oxford English Dictionary remarks, ‘No absolute definition of a ‘book’ in this sense can be given’. … Escarpit, who abandons material concerns altogether, and proposes that the nature of a book is defined by how it is read–only opens up another abyss” (10).

“To provide a more meaningful series of data, a number of minor technical devices might be proposed… At present, one may record alternative places of publication in what is technically known in the MARC format as the 752 field, but we need a third, distinctive field for false or fictitious places, and the 752 field is all-too-rarely-used. One would like to link editions with issues, and issues with states that affect the imprint such as misprinted or variant dates in a unitary record” (12).

Again, the history of the book in the English-speaking realm needs a variety of new catalogues: an on-line catalogue of early periodicals that, at a minimum, would provide a count of the true number of issues, including those that have probably been lost; a catalogue of lost editions of monographs, or some standard for incorporating this information in imprint catalogues like ESTC; and finally, a census of books described in early libraries” (12).

Collaborative Readings #1: Ian Gadd’s “The Use and Misuse of Early English Books Online”

July 7, 2009
We are launching a series of “Collaborative Readings,” borrowing the model popularized so successfully by David Mazella and Carrie Shanafelt on The Long Eighteenth, to discuss some of the items on our bibliography.  “Collaborative Readings” can run concurrently with other postings.

To begin this series, I’ll summarize Ian Gadd’s lucid “The Use and Misuse of Early English Books Online,” which argues that using EEBO properly requires an understanding of its evolution and of the evolution of the catalogues on which it relies.  Particularly crucial, Gadd argues, is an understanding of EEBO’s historical reliance on ESTC.

Gadd’s article falls into three parts.  Part 1 describes the three catalogues on which EEBO and ECCO are based: 

  • STC: Pollard and Redgrave’s Short-title Catalogue of Books Printed in England, Scotland, & Ireland, and of English Books Printed Abroad, 1475-1640
  • WING: Donald Wing’s Short-title Catalogue of Books Printed in England, Scotland, Ireland, Wales, and British America, and of English Books printed in other Countries, 1641-1700
  • ESTC: English Short Title Catalogue, which began its history as The Eighteenth Century Short Title Catalogue, but eventually incorporated material from the previous two catalogues to become The English Short Title Catalogue, retaining its acronym.

Each of these catalogues uses different cataloguing principles and different criteria of inclusion.  The former two differ in what they include, but both catalogue books that have been located (as opposed to copies known to have existed).  The ESTC, on the other hand, began as a computerized and comprehensive union catalogue, merging “together the existing catalogue records of other libraries.”  Because the ESTC includes items in the previous two catalogues, it is, as Gadd puts it,

a hybrid database consisting of three sets of catalogue records, each constructed on different principles.  Searching across these record sets, therefore, poses problems: the unsuspecting student, for example, interested in Stationers’ Company registrations of works might assume that registrations all but dried up after 1640 when in fact this is simply a consequence of information that STC recorded but Wing and ESTC routinely did not.

Part 2 details the evolution of microfilm collections based on these catalogues and their eventual digitization.  Two companies oversaw this process, eventually producing first EEBO then ECCO.

  • UMI: University Microfilms used STC and Wing to produce two series of microfilm collections known as “Early English Books, 1475-1640” and “Early English Books, 1641-1700.”  In 1998, UMI (now ProQuest) digitized copies from these collections to produce EEBO.
  • Research Publications produced a rival microfilm set based on the ESTC.  In 2003, Thomson Gale (now Gale/Cengage) digitized copies from this collection to produce ECCO.

EEBO was permitted to use the bibliographical records of the ESTC, but

it did so for its own purposes: certain categories of data were removed (e.g. collations, Stationers’ Register entrances), some information was amended (e.g. subject headings), and some was added (e.g. microfilm-specific details).

Additionally, there was no formal mechanism for synchronizing the data between the two resources.  Consequently, two divergent holding records exist in EEBO’s and ESTC’s respective catalogues. 

Gadd’s cautionary note pertains to the divergence bewteen these two catalogues:

As both resources continue to amend and expand their bibliographical data for their own purposes, there is an increasing likelihood of significant discrepancy between the two resources. . . . there is no absolute one-to-one correspondence between the pre-1701 entries in ESTC and the materials on EEBO; there are—and will always be—items on ESTC not available on EEBO.

Because different copies in the same edition can vary, there is, Gadd explains,

a vital difference between any single bibliographical record on EEBO and the corresponding ‘image set’: the former describes the particular edition  (or issue), the latter is taken from one copy from that particular edition. Moreover, unlike scholarly facsimile editions, the selection process for microfilming was often arbitrary.  Copies were selected primarily by reference to the copies listed in STC and WING, with particular preference for certain major collections; they were not selected because they were considered representative of a particular edition.

Gadd suggests that EEBO refer to itself as “a library of copies, rather than a catalogue of titles.”

Gadd commends ProQuest for its receptivity toward the scholarly community.  Part 3 briefly reviews ECCO, noting its “underlying text-transcription,” which allows for searches but is flawed by the inaccuracy of the OCR software it uses.