Collaborative Readings #2: James May’s “Some Problems in ECCO (and ESTC)”

by

Anyone working even briefly with archives learns immediately that cataloguing is an art, not a science, and that the successful use of archives demands familiarity with its cataloguers’ idiosyncrasies.  In the first part of his meticulous “Some Problems in ECCO (and ESTC),” James May provides a hard look at cataloguing problems in both ECCO and ESTC.   May’s article has inspired two forthcoming roundtable discussions at EC/ASECS and ASECS and this blog, so it makes sense to provide at least a cursory review of his arguments here.  For May, bibliographical problems limit ECCO “as a set of digital facsimiles” (20).  Below is a list of the topics he covers.

Holdings:  Identifying the copy digitized is problematic because ECCO lists only the library holding the source copy, not the shelfmark. 

  • If the library holds more than one copy of the digitized text, “readers can’t know what is digitized unless its identifiable from MS annotations on the copy digitized” (21). 
  • If the source library’s cataloguing is ambiguous, duplicate entries for the same copy can result.  Among the examples citied is the case of the National Library of Scotland, which provides different shelfmarks for each volume of Edward Young’s 2-volume Poetical Works.  ESTC and ECCO thus list two copies of Young’s Poetical Works in that library, though there is only one (2-vol.) copy.
  • There are “editions in the ESTC (some reproduced on ECCO) that are not separate editions but only reissues of earlier editions” (23).

Attribution errors:

  • Examples include the false attribution of An Account of the Two Brothers, Perseus and Demetrius, . . . Collected from the Grecian History, written by the author of Busiris, . . . the Universal Passion, Satires & c. to Edward Young.  Additionally, both ESTC and ECCO “fail to list Young as the author of A Sea-Piece (Dodsley, 1755).

False claims regarding publishers and places of publication, and dates:

  • Faulty information taken from title pages is often absorbed uncritically into the ESTC and ECCO.  For May, “one problem with the ESTC and ECCO is that nobody surveys and edits its results” (25).  He suggests that “there ought to be a way for scholars to post notes tagged to ESTC and ECCO entries for other scholars to read—a suggestion Rob Hume made . . . two years ago” in The Eighteenth-Century Intelligencer (n.s. 21.1.16).

Format errors: The format of smaller books in particular is often incorrectly listed.  ECCO’s entry for The Works of Dr. Jonathan Swift, vols. 15-17 (1765), incorrectly calls those volumes 12mos rather than 18mos.

Incomplete or misrepresented works: “ECCO offers 16 of the 17 volumes of The Works of Johnathan Swift,. D.D: D.S.P.D. With Notes . . . By J. Hawksworth (Dublin: Williams, 1767-1768)” (25).  Sometimes frontispieces are missing.

Legibility of ECCO’s digital images: 

  • The images lack crispness
  • They  “fail to reproduce red-lettering on title-pages” (26).
  • Some pages are unreadable or incomplete.  See Vol. 3 of Smollett’s Continuation of the Complete History of England, 1762 (pages 167-68, 258, 321, or 328). 
  • Footnotes and marginalia are sometimes obscured.
  • Stains, and gutter loss are problems, with the latter obscuring “the third on all versos between pp. 14 and 24 of Swift’s A Tale of a Tub 1711: ESTC N136369, T49839, and N13640.
  • “Images are sometimes distorted relative to their proportions in height and width” (26). 
  • Later editions are less frequently digitized, though they are often rarer and thus arguably in greater need of digitized preservation than earlier editions.  Many of these subsequent editions were published in Ireland or Scotland, and thus ECCO is correspondingly weaker for a study of Scottish and Irish books and booktrade. 
  • First and revised editions are sometimes neglected.  As May argues, “roughly half of the pre-1775 editions of Young are to be found in ECCO” (29).  Only five of the nine editions of The Force of Religion appear in ECCO. 

Failed Searches: 

  • As May points out, “one must know to exclude ‘not’ from a title search” (27)
  • Sometimes searches for titles and dates fail if the work has not been tagged with the proper date
  • ECCO’s searches miss “a certain percentage of words” (27).

Selectivity:  Like Adam Matthews, which digitized periodicals, ECCO did not use a team of scholars in the process of selecting what should and should not be digitized.  May argues that “more scholarly rigor was no doubt needed when the filming by Gale and its predecessors was done to decide which copy should be filmed” (28). 

May concludes that “scholars need to provide a little noisy feedback to corporate ventures like ECCO if future projects are to benefit from their expertise” (29).

12 Responses to “Collaborative Readings #2: James May’s “Some Problems in ECCO (and ESTC)””

  1. Anna Battigelli Says:

    As I mentioned in the outset of my summary, James May’s article provides a hard look at ECCO, which clearly enriches scholarship. It’s the scholar’s responsibility to be aware of the text-base’s bibliographical weaknesses. That said, precisely because both ECCO and ESTC are so promising, there is certainly room for systematic improvement of their bibliographical entries.

    Particularly useful is May’s repetition of Robert Hume’s suggestion that there ought to be a way for ESTC and ECCO to allow scholars to post notes tagged to faulty entries. For this to be effective, there probably also needs to be an editorial committee charged with reviewing such notes and recommending changes. If the editors of ECCO and ESTC feel that notes tagged to entries interferes with the display of information, perhaps there could be a blog set up where scholars could flag problematic entries. Neither of these suggestions is as systematic as what is required. Are there systematic ways to address the problems May identifies?

    Like

    • Eleanor Shevlin Says:

      The stumbling block to being able to flag or correct an entry by posting a note would probably not arise from concerns about disrupting the appearance of a page or a fear of cluttering it. A hyperlink/pop-up balloon, for instance, would easily address such worries. Rather, the problem in allowing access to general users would be a legitimate (in many ways) concern about quality control and vetting the information: Is the person posting the correction knowledgeable? Does the new information actually correct an error or does it lead to new ones?

      Establishing an accesible online form such as the ESTC’s current password-protected form for contributing libraries, Contributing to the ESTC: Web Matching, which includes instructions for entering minor bibliographic corrections, is one possibility. But that also would require ESTC to have a sufficient staff to review these changes and verify the accuracy of the corrections. As Jim May’s 2001 article asked, James E. May, “Who Will Edit the ESTC? (And Have You Checked OCLC Lately?),” (Analytical and Enumerative Bibliography, n.s. 12 (2001), 288-304)?

      As for ECCO, I am not sure if there’s really an editor–let alone a multiple editors–for this database. There may well be, but I am under the impression that the staff for this specific tool consists of the database’s project manager (Scott Dawson), area marketing reps, tech support, and people who handle permissions and the like–but no purely editorial body. Yet I may well be wrong (and I hope I am). Because Gale created this database from Research Publications’s The Eighteenth Century microfilm collection and incorporated records from ESTC, establishing an editorial staff devoted to the collection once this database was up and on the market may not have seemed necessary at the time. While I did find an advisory board for the Burney Collection (Dr. Moira Goff, Early Printed Collections, British Library; Dr. Brycchan Carey, Reader in English Literature, Kingston University, London; and Markman Ellis, Professor of Eighteenth-Century Studies, Queen Mary University, London) on the Gale-Cangage website, I did not see a board listed for ECCO.

      Like

      • Anna Battigelli Says:

        It would be interesting to hear from editors at ESTC about whether a hyperlink option would be helpful. It could, of course, generate a lot of work and oversight without necessarily contributing a lot of aid. If a scholar or librarian discovered a bibliographical error on an ESTC record today, he or she could simply forward the information to an editor on the ESTC web site. But this isn’t a systematic solution to the problem.

        I would like a fuller understanding of the plans for development at both ESTC and ECCO.

        Like

  2. Eleanor Shevlin Says:

    Thanks, Anna, for this very fine overview of Jim May’s article. Some of the issues that Jim identifies here hold relevancy to Hugh Amory’s distinction between entries (or “hits’) and “books” in his article, “Pseudodoxia Bibliographica, or When is a Book Not a Book? When It’s a Record” (The Scholar & the Database: Papers Presented on 4 November 1999 at the CERL Conference Hosted by the Royal Library, Brussels, 2 [2001]: 1-14). One recommendation Amory offers is improvements in the MARC fields:

    To provide a more meaningful series of data, a number of minor technical devices might be proposed… At present, one may record alternative places of publication in what is technically known in the MARC format as the 752 field, but we need a third, distinctive field for false or fictitious places, and the 752 field is all-too-rarely-used. One would like to link editions with issues, and issues with states that affect the imprint such as misprinted or variant dates in a unitary record” (12).

    Robin Alston’s Review of Snyder’s “History of the ESTC“ offers a richly detailed view of the conflicting opinions regarding the direction of the ESTC, decisions made, cataloguing issues, types of errors, and more. The piece shows clear divisions in approach and obviously represents Alston’s perspectives of the development of the ESTC as a database and missed opportunities. His comments that the project became shaped to serve librarians and not scholars perhaps explains some of the problems that Jim May identifies in his article and experienced by other users. Alston’s remarks near the end of his article sketch the work he sees as still needed:

    The eighteenth century component in ESTC is, as indeed it should be, well on the way to a satisfactory conclusion once the doubts which lurk as informal notes in thousands of records are resolved; the huge number of unverified locations are transformed into verified ones; inconsistencies in headings removed; errors corrected. These are difficult to quantify; but my informed guess after having consulted over a five-year period some 50,000 records, is that at least 100,000 records require attention. For the records imported into the file from OCLC and other sources (excluding, of course, the splendid records contributed by NAIP) there is work a-plenty for another twenty five years. Is it realistic to imagine that the wells will continue to furnish resources for that long? The records for the STC period from1475 to 1640 have one significant advantage over those for Wing: they have been the subject of detailed bibliographical scrutiny for over a century. The records for Wing will demand a huge effort to bring them up to the standards set by STC and the ESTC records created at the British Library. (161)

    The issue of correcting ESTC records by those who are not working at the BL and Riverside also raises concerns about quality control–and the problem of oversight of the corrections being made. I do know that in 1998 or so some contributing institutions (the Folger for one) were given access/authority to make adjustments, but I am not sure if that access extends now to all contributing institutions.
    On a different note, I will offer a very positive experience I had using an image of a title-page from the ECCO database to verify a holding’s record. I had ordered a digital image of the title page for a particular edition of The Northern Atalantis from the Kansas Spencer Research Library to accompany an article I had written. When questions arose about whether the copy at the Spencer Library was indeed the one I wanted, I sent the ECCO image (which was taken from a copy in the BL) to the Spencer librarian to confirm that the Kansas copy was indeed the same edition I had seen at the BL (the ESTC indicated that it was). The image helped the librarian determine that the Spencer copy was not the same as the BL one and that there was an error in the Kansas catalogue (now happily corrected and sent to ESTC for correction, too).

    Like

  3. Anna Battigelli Says:

    Alston’s suggestion that it will take 25 years to address these
    issues seems reasonable to me–as is his concern regarding the likelihood of sustaining support for that period.

    It would be interesting to know whether the ESTC editors are considering how to strengthen MARC fields.

    I like your positive narrative about the value of using an ECCO
    image to correct a bibliographical entry in a Kansas catalogue.
    AB

    Like

  4. Anna Battigelli Says:

    It seems to me that Jim covers four categories of error within ECCO and ESTC that require attention:

    1. Holdings information (including attribution errors, false claims, and format errors)
    2. Legibility issues
    3. Searching mechanism
    4. Selectivity (including incomplete or misrepresented works)

    Of these, the least urgent may be “legibility,” which for the most part seems fairly good to me. In fact, sometimes it is easier to read the black and white digitized page than the original. (For a compelling disagreement, see Diana Kichuk’s excellent “Metamorphosis: Remediation in Early English Books Online (EEBO),” Literary and Linguistic Computing 22:3 (2007), 291-303.) Kichuk provides stunningly clear instances of poor visibility in EEBO’s collection and some of her criticisms would seem to apply to ECCO. And honestly, I would not want to only read digitized pages.

    Holdings, searching, and selectivity, however, seem to be both urgent and inextricably interrelated problems. Does ECCO have a scholarly editorial board charged with reviewing and correcting such problems?

    Like

  5. Eleanor Shevlin Says:

    As I mentioned above, I am under the impression that ECCO does not have a specific scholarly editorial board. While I view the searching mechanisms and interface as issues clearly under Gale-Cengage’s domain and control, problems with holdings information (including attribution errors, false claims, and format errors) seem to originate with ESTC information and thus were inherited by Gale. The speed and accessibility enabled by the online availability of these texts have now perhaps made such errors more noticeable–I don’t remember this type of discussion arising from the use of the Eighteenth Century Collection microfilmed by Research Publications. As for selectivity, it seems that decisions about what texts to include were really made by Research Publications when they started to microfilm eighteenth-century texts in 1982.

    Like

  6. Eleanor Shevlin Says:

    The following is a document prepared by Gale-Cengage that provides an overview of ECCO’s origins (and a brief history of ESTC, too):

    The origins of Eighteenth Century Collections Online (‘ECCO’)1.

    Introduction: the ESTC project
    The Enlightenment in England has its roots in a theory about man, society, education, and aspirations on the one hand, and a very practical activity, printing, on the other.
    Ideas, and their absorption into the fabric of society at every level, depend upon communication. The urgent need to build a more broadly-based economic, political, social, and educa¬tional basis for society after the sectarian crises of the seventeenth century was well understood by the leading thinkers of the period between the death of James II and the accession of Anne.
    The first crucial step in the process of ensuring the transition to the newer thinking, and the profound changes in social structure it would bring about, was the liberation of the press from the stranglehold which London, and the Stationers’ Company, had exerted on who could print, and where.
    • In 1695 printing in England was permitted only in London, the two university towns and York. By the middle of the eighteenth century it had spread to every county, and by the end of the century is documented in over a thousand towns and villages.
    The part that the printing press played in the shaping of all those features of English civilization which led to colonial expansion and the Empire has never been fully explained. Nor could it have been, until the sources were identified, described and made available through the ESTC project.

    2. Publication History and Title Selection policy
    1976: The Eighteenth Century Short Title Catalogue (the original ‘ESTC’) starts with a conference jointly sponsored by the American Society for Eighteenth-Century Studies and the British Library. The aim of the project was to create a machine-readable union catalogue of books, pamphlets and other ephemeral material printed in English-speaking countries from 1701 to 1800.

    1977: BL starts to re-catalogue its C18th book holdings, a project which has gradually expanded to cover the resources of over 2000 libraries worldwide. A decision was made not to add Subject headings or Subject index terms for each title.
    The catalogue as a whole “attempts to record all books, pamphlets, and single sheets printed in any language in the territories governed by Britain during the eighteenth century, and all items printed in English anywhere in the world between 1701 and 1800.”
    • ESTC does not include certain categories of material. Specifically excluded are: printed blank forms; trade-cards; labels; engraved prints, maps, music; playbills; tickets; book-plates; playing cards; puzzles, etc.
    • However ESTC does include a great quantity of printed items which are generally regarded as “ephemeral”, and which deal with every aspect of eighteenth-century life in the British Isles and British dependent territories, North America, the Caribbean and India.
    • ESTC includes French, Dutch, German, Spanish and other non-English language materials published in Britain or its colonies

    1982: Research Publications starts to publish The Eighteenth Century microfilm collection, based on the ESTC records and initially sourced from the holdings of the British Library. Guided by the interests of those studying the texts, items initially included were limited to first and significant editions of each title. Exceptions to this rule are the works of 28 major authors, all of whose editions are included where available:

    • Addison, Bentham, Bishop Berkeley, Boswell, Burke, Burns, Congreve, Defoe, Jonathan Edwards, Fielding, Franklin, Garrick, Gibbon, Goldsmith, Hume, Johnson, Paine, Pope, Reynolds, Richardson, Bolingbroke, Sheridan, Adam Smith, Smollett, Steele, Sterne, Swift and Wesley.

    • Note: The Eighteenth Century contains all qualifying C18th printings, not only titles published for the first time in the years 1701-1800. Thus new editions/printings of many standard authors of the C16th and C17th are included.

    1988: Given the growth of interest in the total output of C18th printing, and its impact on many aspects of social history, the selection criterion for The Eighteenth Century was expanded to include all distinct editions of a work except for such titles as the Bible and the Book of Common Prayer that were so frequently reprinted, and are so large, that it was thought inappropriate and impractical to apply the new criterion to them

    1989: The ESTC database starts to include records of the period from the beginning of printing in the British Isles (ca. 1472) to 1700.

    1996: The ESTC file changed its name to the ‘English Short Title Catalogue’, recording holdings in more than 2000 libraries world-wide including the British Library.

    1996: Research Publications (now PSM) is appointed the CD-ROM distributor outside the UK of the renamed ESTC, which is published by the British Library.

    2003: Gale releases ECCO, the online searchable facsimile edition of every title which had appeared in The Eighteenth Century microfilm collection up to the end of 2002, with the exception of non-English language texts using non-Roman alphabets. The MARC records are made available separately.
    – ECCO on its release contains over 155,000 volumes from 136,000 works, in over 26 million pages

    2006: The ESTC is made available free on the British Library website http://estc.bl.uk

    2009: Gale releases ECCO Part 11, which adds 42,000 titles in 45,000 volumes and 7 million pages to ECCO, sourced from titles microfilmed from January 2003 through 2007
    – Cross-searching with EEBO (Early English Books Online (1475-1700) is introduced in ECCO . Authors whose output spans the C17th and C18th centuries can be retrieved where a library holds both databases. As a result, for example, 65 editions of Defoe in EEBO may be accessible in addition to the 805 titles available in ECCO alone

    – In late 2009, the non-English language materials in non-Roman fonts will be included also. These will be searchable by title and other metadata, but not text-searchable
    – In 2009 also a Subject field will be added to the MARC records, obtained from the LoC (Library of Congress,Washington)

    Mark Holland

    Sources: Robin Alston, The Eighteenth Century, Research Publications, 1982; British Library website http://estc.bl.uk

    As this history details, MARC records for ECCO will soon include a “Subject Field” supplied by the Library of Congress. Gale is incorporating this information, but it should be remembered that it is not the entity responsible for what appears in the subject field.

    Like

  7. Anna Battigelli Says:

    This conversation and analysis of the impact of online bibliography on the profession could not occur if the ESTC had not become available for free on the BL website in 2006. Alston’s review of Snyder’s History makes clear the inevitable conflicts and struggles behind such an ambitious project. We owe all those involved in the ESTC’s history–past, present, and future–a great debt. That we want more in the way that James May suggests is an indication of the ESTC’s success.

    We may feel freer to criticize ECCO because its cost begs the question of its value. If, in fact, there is no editorial board at ECCO, is this acceptable?

    Like

  8. Eleanor Shevlin Says:

    I may well be wrong about ECCO not currently having a specific editorial board (I’ve written Scott Dawson to ask if this is indeed the case). Yet, I also understand why once ECCO (Part 1) was complete and ready as a product (i.e., the microfilm digitized, database prepared, search interface established, etc., etc.) that it might not have occurred to Cengage-Gale to establish an editor/editorial staff for this product. In other words, Cengage-Gale may well have initially viewed their role as primarily a delivery/access provider of a set of texts that had already been subject to complete editorial overview by Research Publications and ESTC. Cengage-Gale seems sincerely concerned about the quality of its product, so if it did not (does not) have an editorial person or persons, the need to do so–even if it is a person who would coordinate the incorporating the revisions to ESTC (as well as information already in ESTC but omitted from ECCO)–would now seem more evident.

    When I approached ESTC and Gale about the EC/ASECS session, both were extremely responsive–and both were already familiar with Jim May’s article and had already been in discussion with one another about how to address some of the issues that Jim’s piece raised. As an outsider, I would think it would be important that efforts to improve ECCO be coordinated with ESTC editors and their editing efforts.

    The legibility issue, search capabilities, OCR, and the like would seem to be issues that fall under technological improvements. Holdings and selectivity are issues that would require scholarly/editorial expertise to improve.

    That many of these problems existed with the microfilm collection (e.g., which editions were selected to be filmed) but did not generate (to my knowledge) the discussions taking place about ECCO’s shortcomings is quite interesting. Search capability and accessibility (notwithstanding the limitations imposed by cost) have brought these texts to a wider group of scholars and graduate students–and the bibliographic issues may not be foremost in the minds of new users who are now accessing these texts. The opening portion of Stephen Tabor’s “ESTC and the Bibliographic Community” is especially relevant here.

    Like

  9. Eleanor Shevlin Says:

    Scott Dawson at Gale-Cengage confirmed that ECCO currently does not have an editorial board for ECCO for many of the reasons I gave above. However, he noted that G-C is considering the need to create one in light of the issues raised by Jim May’s article and others–and he envisions that this matter will be part of the discussions at the EC/ASECS and ASECS sessions this fall and spring.

    Like

  10. Anna Battigelli Says:

    This is a promising development. Thanks, Eleanor, for contacting Scott.

    Like

Leave a comment