English Broadside Ballad Archive (EBBA at UCSB)

February 25, 2013

This is the second of a two-part series on free digital archives featuring English ballads.  It follows Eleanor’s discussion of the JISC-funded Broadside Ballad Initiative at Oxford.

The University of California at Santa Barbara has created a free digital ballad collection called The English Broadside Ballad Archive (EBBA), which provides access to more than 8,000 seventeenth-century ballads.  The collection includes ballads from the Pepys Collection, the Roxburgh Collection, the Euing Collection, and the Huntington Library.  EBBA is directed by Patricia Fumerton at UCSB.  This project was supported by the N.E.H.

Individual entries provide links to  sheet facsimiles, facsimile transcriptions, and often recordings.  These features facilitate introducing students both to ballads’ visual details–ornaments, woodcuts, columned verse–and to their tunes.

Cataloging is full and includes the following:

EBBA ID: An internal identifier. Each individual ballad in the archive has a unique EBBA ID.

Title: A diplomatic transcription of the ballad title as it appears on the ballad sheet. The title consists of all ballad text before the first lines of the ballad, including verse headers but excluding text recorded elsewhere under other catalogue headings (such as the license or author, date, publisher and printer imprints).

Date Published: The year—or, in most cases, range of years—during which EBBA believes the ballad to have been published. See Dates.

Author: The recognized author of the ballad in cases where an indication of authorship has been printed on the ballad or, in the case of Pepys ballads, when Weinstein has identified an author from external sources (e.g., Wing, Rollins).

Standard Tune: The standardized name for the melody (according to Claude M. Simpson or other reliable sources). Clicking the standard tune name will return all ballads with the same melody, including alternate tune titles.

Imprint: A diplomatic transcription of the printing, publishing, and/or location information as it appears on the ballad sheet.

License: A diplomatic transcription of the licensing or permission information as printed on the ballad.

Collection: The name of the collection to which the ballad belongs. In cases where the ballad is not part of a named collection, the name of the holding library plus “miscellaneous” will appear. For example, Huntington Library ballads that are not part of a collection are grouped as “HEH Miscellaneous.”

Sheet/Page: For ballads that are collected as independent sheets, the citation page displays the word “Sheet” and lists the sheet number given to it by its holding institution (usually part of its shelfmark). For ballads bound in a book, the citation page displays the word “Page” and lists the page number within the bound volume.

Location: The name of the holding institution.

Shelfmark: The shelfmark assigned by the holding institution.

ESTC ID: The Citation Number for the English Short Title Catalogue (ESTC). Use this number to find the full ESTC citation for any given ballad at

Keyword Categories: The keywords from EBBA’s standardized keyword list that relate to the ballad’s theme and content.

Notes: Clarify potential areas of confusion for users, such as ballads that have print on both sides of a sheet.

MARC Record: A link to our MARC-XML records

Additional Information: Information specific to each part of the ballad.

Title: Separate titles for multi-part ballads.

Tune Imprint: Tune title(s) as printed.

First Lines: A diplomatic transcription of the first two lines of the ballad text proper, below any heading information included in the title or elsewhere under other catalogue headings.

Refrain: Repeated lines at the end of or within ballad stanzas.

Condition: Description of ballad sheet damage and the current state of the sheet. (This information is from Weinstein and is currently for the Pepys collection only.)

Ornament: A list of decorations made of cast metal that appear on the ballad. Frequently used to fill empty spaces in the forme and/or to delimit parts of the ballad text, these ornaments include vertical rules, horizontal rules, and cast fleurons. (This information is from Weinstein and is currently for the Pepys collection only.)

Ballad scholars working with EEBO or ECCO will be familiar with the difficulty of finding ballads, making English Broadside Ballad Archive and Bodleian Library Broadside Ballads necessary.

Together with new printed resources, such as Patricia Fumerton and Anita Guerrini’s Ballads and Broadsides in Britain, 1500-1800 (Ashgate 2010) and Angela McShane’s Political Broadside Ballads of Seventeenth-Century England: A Critical Bibliography (Pickering & Chatto 2011), these digital resources provide a robust and growing archive  for the systematic study of a format whose transiency may have discouraged such studies in the past.

British Newspaper Archive: Not Burney (yet), But Still Useful

July 6, 2012

Launched this past November, the British Newspaper Archives is a joint project of the British Library and brightsolid online publishing. Over the next decade, this partnership is slated to digitize over 40 million pages of the BL’s newspaper collection. The site anticipates a wide audience that includes not only scholars but amateur historians, genealogists and more.

While the project often digitizes the original paper copies, it has also digitized from the BL’s microfilm copies because the process is faster and enables more pages to be made available in a shorter amount of time. The quality of the pages, however, does suffer as the website admits; unfortunately, this emphasis on speed means that the accuracy of the search results is forever sacrificed. That said, one can view the OCR text and correct it:

When viewing an image, the OCR text can be viewed via the left nav All Articles option. You can select an individual article and then select Show Article text and the text. This addictive option can be accessed by simply clicking the list of sections displayed and applying your own corrections. By correcting the text, you will be adding to the quality of the data that can be searched by others. Please note that during the launch period updates to corrections will take longer to appear. (“Getting Started”)

The site’s descriptive information suggests that the collection dates primarily from the nineteenth century on, but there are 24 eighteenth-century provincial newspaper titles available in the current collection (full list appears below). As of yet, there are no eighteenth-century London papers. Like Burney, the British Newspaper Archives is a subscription database. Unlike Burney, though, provisions for individual subscriptions exist. The rates also seem quite reasonable and offer an array of plans (credits refer to the number of views; each view “costs” 5 credits; the view option enables you to download or printing):

  • 12 Month Package (unlimited pages)
    Price: £79.95 GBP,       Valid For: 365 days       Credits: Unlimited*
  • 30 Day Package (up to 600 pages)
    Price: £29.95 GBP       Valid For: 30 days      Credits: 3000
  • 7 Day Package (up to 120 pages)
    Price: £9.95 GBP       Valid For: 7 days       Credits: 600
  • 2 Day Package (up to 100 pages)
    Price: £6.95 GBP       Valid For: 2 days      Credits: 500
  • Potential users are also able to register using an email address and receive 15 free credits—-a very limited trial of sorts.

    Searches can be conducted either as simple or advanced. The advanced search includes searching by “All of these words,” “Any of these words,” “Without these words,” and “Phrase.” You also have the option of applying filters such as dates, place of publication, publication title, or article type (advertisement, article, family notice, illustrated, miscellaneous). There is also the option of browsing by titles. Unfortunately, you cannot use the wildcard characters to help counteract the poor OCR, long “f,” or other typographical peculiarities that the Burney search interface provides. Nor does the BNA offer features similar to Burney’s search aids such as the “w” or “n” joined by a number to find two terms within a certain proximity of one another.

    Search results can be ordered by relevance or by date (either by ascending or descending order), and a glimpse of the context in which the search term results occur are given. For example,

    Ipswich Journal
    Sat 10 Jan 1784 Suffolk, England
    5 U F F O L K. 1 0 be Ll’. TT, ant! enlered upon immediatrly, THAT olil*aeculbmed Public Honic,
    8343 Words
    “SfMON PATERNOSTER of Wickhairi-market, to be agent for the faitl company for the town of Wiek-, and parts adjacent. The company infure lioufeS, bufrdings, … ?

    As this example demonstrates, the context provides the OCR text with all its warts. Still, it helps the user decide if the article is worth viewing and assists in conserving the credits in one’s account.

    CAVEAT: During my two-day exploration of the BNA, I encountered several cases in which I clicked to view an article only to discover the article was not on that page. Five credits were still deducted from my account and continued to be deducted as I browsed other pages in the issue. Once or twice I was not able to find the result at all; other times it appeared on a different page within that issue.

    Here is a list of eighteenth-century titles currently available in BNA:

    • Aberdeen Journal (105)
    • Bath Chronicle and Weekly Gazette (1989)
    • Birmingham Gazette (8)
    • Bristol Mercury (1)
    • Caledonian Mercury (7309)
    • Chelmsford Chronicle (329)
    • Derby Mercury (2595)
    • Hampshire Chronicle (1115)
    • Hampshire Telegraph (11)
    • Hereford Journal (982)
    • Ipswich Journal (1203)
    • Ipswich Journal, The (1296)
    • Kentish Gazette (374)
    • Leeds Intelligencer (2352)
    • Manchester Mercury (223)
    • Newcastle Courant (2561)
    • Norfolk Chronicle (1109)
    • Northampton Mercury (1625)
    • Oxford Journal (2434)
    • Reading Mercury (570)
    • Salisbury and Winchester Journal (17)
    • Scots Magazine, The (611)
    • Sherborne Mercury (256)
    • Sussex Advertiser (60)

JISC’s Historic Books: Searching EEBO, ECCO for meaning

March 6, 2012

This past fall JISC announced a new venture, the JISC eCollections, “a new community-owned content service for UK HE and FE institutions.” What might interest EMOB readers most is its Historic Books. This digital collection contains over 300,000 books from before 1800 and also makes over 65,000 19th-century first editions from the British Library available for the first time online. The entire corpus is accessible through institutional subscription and, most welcome, searchable over a single platform.

The pre-1800 material in the JISC Historic Books eCollection consists solely of ProQuest’s Early English Books Online (EEBO) and Gale’s Eighteenth Century Collections Online (ECCO) textbases, so some might wonder what this collection offers that is new for those working in the early modern period. One does not need to be in eCollections, for instance, to conduct searches simultaneously across both databases. Yet the Help page for the eCollections indicates that more than just the convenience of a single interface and platform is being offered:

JISC Historic Books uses meaning-based searching rather than traditional keyword searching, which is why you will notice you get different results to searching EEBO and ECCO on the publishers sites. Meaning-based searching enables you to find conceptual and contexual [sic] links betweeen [sic] related documents which aren’t possible using traditional keyword searching.

Besides returning traditional results, JISC Historic Books also delivers “meaning-based” concepts deemed relevant to the search in the form of a Concept Cloud:

Concept Cloud

The more prominent the word, the more relevant it is deemed to the search, and as the screenshot indicates, items in the cloud can be manipulated to narrow one’s search further.

Over the past three or four years (and maybe longer) I have been consistently struck by the transformations that traditional searches of ECCO, Burney, EEBO, as well as Google Books have had on the ways I think about searching, construct searches, and view my results. More specifically, these keyword searches, described here as traditional, were already encouraging me to view results in a more networked, contextual way and, as a consequence, to devise additional searches aimed at teasing out new potential relationships. The meaning-based search enabled by JISC’s mimas platform, of course, is offering something quite different, but I wonder how its use might cause rethinking of what it means to search and research.

It would be interesting to hear from EEBO and EECO users in the UK who have used JISC Historic Books, especially the differences between results obtained from searching using the JISC platform and those obtained by searching using the original publishers’ platform.


Text Creation Partnership makes 18th century texts freely available to the public

April 25, 2011

This announcement is making the rounds of listservs and the like, and it should be of interest to emob readers:

(Ann Arbor, MI—April 25, 2011) — The University of Michigan Library announced the opening to the public of 2,229 searchable keyed-text editions of books from Eighteenth Century Collections Online (ECCO). ECCO is an important research database that includes every significant English-language and foreign-language title printed in the United Kingdom during the 18th century, along with thousands of important works from the Americas. ECCO contains more than 32 million pages of text and over 205,000 individual volumes, all fully searchable. ECCO is published by Gale, part of Cengage Learning.

The Text Creation Partnership (TCP) produced the 2,229 keyed texts in collaboration with Gale, which provided page images for keying and is permitting the release of the keyed texts in support of the Library’s commitment to the creation of open access cultural heritage archives. Gale has been a generous partner, according to Maria Bonn, Associate University Librarian for Publishing. “Gale’s support for the TCP’s ECCO project will enhance the research experience for 18th century scholars and students around the world.”

Laura Mandell, Professor of English and Digital Humanities at Miami University of Ohio, says, “The 2,229 ECCO texts that have been typed by the Text Creation Partnership, from Pope’s Essay on Man to a ‘Discourse addressed to an Infidel Mathematician,’ are gems.”

Mandell, a key collaborator on 18thConnect, an online resource initiative in 18th century studies, says that the TCP is “a groundbreaking partnership that is creating the highest quality 18th century scholarship in digital form.”

This announcement marks another milestone in the work of the TCP, a partnership between the University of Michigan and Oxford University, which since 1999 has collaborated with scholars, commercial publishers, and university libraries to produce scholar-ready (that is, TEI-compliant, SGML/XML enhanced) text editions of works from digital image collections, including ECCO, Early English Books Online (EEBO) from ProQuest, and Evans Early American Imprint from Readex.

The TCP has also just published 4,180 texts from the second phase of its EEBO project, having already converted 25,355 books in its first phase, leaving 39,000 yet to be keyed and encoded. According to Ari Friedlander, TCP Outreach Coordinator, the EEBO-TCP project is much larger than ECCO-TCP because pre-1700 works are more difficult to capture with optical character recognition (OCR) than ECCO’s 18th-century texts, and therefore depend entirely on the TCP’s manual conversion for the creation of fully searchable editions.

Friedlander explains that, for a limited period, the EEBO-TCP digital editions are available only to subscribers—ten years from their initial release—as per TCP’s agreement with the publisher. Eventually all TCP-created titles will be freely available to scholars, researchers, and readers everywhere under the Creative Commons Public Domain Mark (PDM).

Paul Courant, University Librarian and Dean of Libraries, says that large projects such as those undertaken by the TCP are only possible when the full range of library, scholarly, and publishing resources are brought together. “The TCP illustrates the dynamic role played by today’s academic research library in encouraging library collaboration, forging public/private partnerships, and ensuring open access to our shared cultural and scholarly record.”

More than 125 libraries participate in the TCP, as does the Joint Information Systems (JISC), which represents many British libraries and educational institutions.

To learn more about the Text Creation Partnership, visit To learn more about ECCO, visit

ASECS Summary of “Some Noisy Feedback” Roundtable, Albuquerque 3/18/10

March 27, 2010

ECCO, EEBO, and the Burney Collection: Some “Noisy Feedback” Roundtable

Chair: Anna Battigelli (SUNY Plattsburgh)   Panelists: Sayre Greenfield (University of Pittsburgh, Greensburg), Stephen Karian (Marquette University), James E. May (Penn State University—DuBois), Eleanor Shevlin (West Chester University), Michael Suarez (Rare Book School, University of Virginia).  Respondents: Jo-Anne Hogan, (ProQuest), Brian Geiger (ESTC, University of California, Riverside), and Scott Dawson (Gale/Cengage).

The following offers a summary of the roundtable that took place, Thursday,  March 18, 2010  at the ASECS 2010 conference in Albuquerque, N.M.  This session was the second part of a two-part series, the first part having been a roundtable discussion chaired by Eleanor Shevlin at the EC/ASECS meeting in Bethlehem, Pa in October 2009.  Copies of Eleanor’s summary of the EC/ASECS session (published in the Eighteenth-Century Intelligencer and also on this blog) were distributed at the outset of this session.  Many thanks to the members of the audience who so cheerfully presented themselves at an early hour on the conference’s first day.

Sayre Greenfield opened discussion with detailed working solutions to problems caused by ECCO’s OCR (optical character recognition) software.  He recommended that Gale provide an ECCO OCR troubleshooting page on their web site and noted that blogs like this one would be sure to start that process (see below).  Aided by Deidre Stuffer, he found ways to correct for errors stemming from the following letter combinations that OCR typically mistranslates: s, ss, and ct.  Using the word, fishmonger, he substituted for the s every other letter, then substituted numbers, and finally the wildcard question mark.  Advice from his search results, including how best to use the question mark as a wildcard, can be found on the ECCO OCR Troubleshooting Page on the “Pages” section of this blog.  He warned that using the question mark for any medial or initial s is problematic if one is using variables elsewhere, adding that ECCO does not allow wildcards for the first letter of a word.  Additionally, letters surrounding the s seem to affect how the OCR reads the s.  The double ss, for example, frequently morphs into fl, transforming passion into paflion. Word searching within a text also proved problematic.  Though he found 32 instances of passion or passions when he read John Tottie’s A View of Reason and Passion, his electronic search using passion* yielded only half of these.  Turning to ct, he found that OCR often reads ct as t, so that objection becomes objetion.  These results suggest that ECCO would help users by strengthening its web site, which currently recommends fuzzy searches to address OCR problems.  Fuzzy searches create too many false positive results.  Including a more robust help page on this issue is necessary.  (For now, see Sayre’s ECCO OCR Troubleshooting Page on this blog.)

Steve Karian began by acknowledging the indispensability of ESTC for bibliometrics, but he also identified four problems that need to be addressed if the ESTC is to become the powerful tool it can be for the twenty-first century.  The first is the ESTC’s unit of measurement: the ESTC record.  Users often equate an ESTC record with an imprint, title, edition, or an issue.  Because of variations in the correlation of record to item, one cannot simply assume that two parallel sets of search “hits” can be compared reliably.  As he puts it, “one is constantly comparing apples to oranges.”  Additionally, field records vary, limiting or complicating the kinds of searches that can be done.  These need to be standardized if searching is to become reliable.  The two ESTCs—one at UC-Riverside, the other at the British Library—use the same data but different interfaces.  Dates are complicated because they appear in two MARC (Machine-Readable Cataloguing) fields.  Steve recommended deleting the MARC record entirely and replacing it with a new database structure, one designed to expand and grow.  He called for a new stage of innovation, allowing the ESTC to transform itself from a bibliographical catalogue into a bibliographical database.  Only through such a transformation will the ESTC become the powerful tool it promises to be.

Jim May discussed the Burney Collection, which he argued should be called the Burney Collection of Newspapers, Periodicals, and Other Printed Matter.  Its material was first collected by Charles Burney, subsequently increased by the British Library, and eventually microfilmed before being turned over to Gale/Cengage.  It includes material dating back to the 1620s and beyond  1800 and material printed in Barbados, India, Ireland, and North America.  Citing James Tierney’s comments at the Bethlehem meeting, Jim noted that the collection includes 237 newspapers and 161 periodicals, 60 of which are partially available in Adam Matthews Eighteenth-Century Journals series or ProQuest’s British Periodicals.  Burney allows one to read an entire issue or study issues by year or month, and it offers searching, though this is problematic.  According to Jim’s results, searching sometimes yields only 10% of the relevant items.  Searching for “Tatler” between 1708 and 1712 yields 80 hits.  Though he has found hundreds of advertisements of Smollett’s Continuation of the Complete History of England, only few of these can be found through an electronic search.  Similarly, only a third or fewer of The London Evening Posts published 1760-61 turn up when you search for “London Evening”.  Robert Hume and Ashley Marshall have an essay forthcoming in Papers of the Bibliographical Society of America discussing Burney and noting, among other problems, how definite and indefinite articles interfere with searches.  Jim also cited Simon Tanner’s article in D-Lib Magazine (July/August 2009), which found the following accuracy rates for Burney: character 75%, word 65%, significant word 48.4%, capitalized word 47.4.% and number 59.3%.   The magnification feature enlarges pages by 100% and would be more useful if it magnified by 33%.  Spread dates are misrepresented, due to the lack of editorial apparatus explaining when newspapers were actually issued.  Burney’s lack of editorial apparatus, cross references, comments, and so forth is a deficit.  Having a scholarly editor–perhaps a graduate student or postdoc intership– would improve its utility.  Also needed is a review of the entire database.  A page dedicated to errors encountered by users would help, something EEBO is now working on with in its “EEBO Interactions, A Social Network.”

Eleanor Shevlin identified three pressing needs: 1) fostering greater awareness of the context of texts; 2) encouraging collaboration among users; and 3) cultivating greater access to these electronic resources.  She pointed to the need for bibliographical training in order to use these resources accurately and called for an examination of the cognitive effects these tools have on research processes.  Specifically, she wondered how EEBO’s TCP transcriptions or ECCO’s searching mechanism affects research methodology.  Noting that these tools provide opportunities to correct bibliographical inaccuracies, she urged the need for a more standardized process through which corrections could be forwarded to the ESTC or to commercial databases.  She also cited examples of productive collaboration among members of the bibliographic community, including her own experience correcting an error in Kansas’s Spencer Research library, a correction made possible by sending ECCO’s image of the British Library’s copy of a text to Kansas.  Finally, she noted that access continues to be a problem.  Scholars in the U.S. work at a notable disadvantage compared to scholars in the U.K. who typically have access to ECCO and ECCO II through the Joint Information Systems Committee (JISC).  ASECS President Peter Reill’s recent calls for feedback regarding access suggests that the issue is at least on the radar of those who can help, either through negotiations for large-scale access or  individual subscriptions.

Michael Suarez warned against the illusion of comprehensiveness in database searches.  Users are frequently unaware of what is missing in these databases, and the databases’ selectivity impoverishes word searches as tools for analysis.  Turning to the task of text-mining, he expressed skepticism regarding the mentalities of mining.  Where sustained engagement with individual texts allows for work linking texts to their culture and to other texts, textual extraction can produce radically decontextualized results.  Because these database tools are easy to use, we are, he warned, insufficiently uneasy with what they actually accomplish.  Suarez insisted that textual analysis demands an effort to fuse horizons between text and reader, a fusion that involves a reader’s deep engagement with a text’s historical context and with a text’s relationship to other texts.  Such contextualization, as James Boyd White would agree, is essential to a functional and robust literary hermeneutics.  Additionally, text-mining tools encourage scholars to work in even greater isolation, away from libraries and other scholars.  Precisely because the digital future will change the way we think, Suarez called for a greater bibliographical literacy in order to make these promising tools work properly.

Panelists’ Responses:

Jo-Anne Hogan (ProQuest)  agreed with Michael’s concern regarding the impact of these digitization projects.  She added that EEBO routinely receives emails pointing out errors, asking for missing items, and making recommendations, and that it works to incorporate these suggestions.  But she also noted a growing digital divide: concerns voiced at conferences like ASECS differed from those at conferences on the digital humanities.  At the latter, attendants ask EEBO to produce more tools for text-mining.  It is sometimes difficult to reconcile the competing requests received.  Money matters in these issues, and will always be a factor.  She agreed that more could be done to align the bibliographic data in EEBO with that in the ESTC and pointed out that efforts are under way to make that happen.  She also introduced the prospect of a social networking site for EEBO intended to facilitate communication between scholars and users so corrections can be reported and more contextual information can be made available.  We hope to hear more from her about this on this blog in the near future.  Access, she concluded, continues to be a concern, agreeing with Eleanor that it is unfortunate not to have a model for broad access in the U.S.  Personal subscriptions seem unlikely because such subscriptions cannot cover costs, at least not at subscription rates individuals are willing to pay. She hoped there might be a point in the future when ProQuest can provide broader access, but she could not guarantee such a thing.  More promising is the prospect that about half of the books in EEBO will soon be available for purchase at reasonable rates via Print on Demand.

Scott Dawson (Gale) agreed with Sayre’s suggestion that a Help screen dedicated to OCR problems  is an idea to consider seriously.  He added that Gale would look into post-OCR checks that might correct results.  18thConnect will help by testing new OCR software on ECCO page images, and that might solve problems.  Turning to Steve’s comments about ESTC, Scott noted that ECCO depends on ESTC for metadata, and that Gale is working with ESTC to add a link within the ECCO Full Citation to report problems with a given record.  He agreed with Jim May that Burney presents additional obstacles to getting accurate OCR  results.  Gale has been working with the British Library to resolve the issue of spread dates and hopes to have an update in the next few months.  On the issue of access raised by Eleanor, Scott mentioned that ECCO is concerned about the issue, but that by providing access to more than 500 institutions globally, it has helped make early modern printed material more accessible than is possible through hard copy or microfilm.  Tiered pricing and consortia-designed contracts help non-ARL institutions find ways to subscribe to ECCO.  He greed with Michael Suarez that ECCO is incomplete, even with the 50,000 titles added through ECCO II.   Gale is not planning an ECCO III.  But the possibility of linking missing titles to ECCO is being considered.

Brian Geiger (ESTC) outlined two main areas of work at the Center for Bibliographical Studies and Research (CBSR), which manages the North American branch of the ESTC.  First, they continue to upgrade and add records to the ESTC.  They are processing OPAC extracts from libraries, and recently began on an extract from Oxford University that resulted in some 200,000 records that will be matched against the file.  These OPAC extracts provide shelf marks (or call numbers) for existing items, and have turned up tens of thousands of new copies and hundreds of entirely new items.  They are adding urls from online collections.  EEBO, ECCO and TCP are matched, though not yet displayed by the public version at the British Library.  Brian has requested urls from Google and will do the same from Internet Archive.  They are digitizing title pages from paper reports submitted over the last two decades and will attach those images to the appropriate records, allowing users to compare a title page to its MARC record.  They hope to have many of the title pages in the ESTC by 2011.  And they have enhanced some 180,000 MARC records from title pages in ECCO.  Second, the ESTC has started to assess how to transform the project from an online catalog to a flexible and interactive database-driven research tool.  Brian corroborated Steve Karian’s assessment that this new resource should be built on relational databases, and noted with appreciation the value of the kind of collaborative thinking Steve offered about the project’s future.  Brian emphasized that a number of partner projects and institutions should be involved in the redesign, to ensure that the new project meets a variety of user needs and to try to plan for the sharing of information across platforms.  He mentioned some of the features that he thought should be included, among them user editing of bibliographic data and metadata and tools to send information to users about updates or changes to records.  He ended by pointing out that development of the database will require resources and the next stage of the ESTC’s evolution will be contingent on funding.  The ESTC is currently engaged in grant development.  It will be in a better position to discuss specific solutions once funding is secured.

Collaboration, Costs, and Digital Resources

January 30, 2010

On February 19 and 20 Yale will host a graduate student symposium, The Past’s Digital Presence Conference: Database, Archive and Knowledge Work in the Humanities. A quick survey of the conference program and available abstracts indicate several topics that dovetail with issues or subjects that have engaged emob. Jessica Weare’s paper, “The Dark Tide: Digital Preservation, Interpretive Loss, and the Google Books Project”, for instance, examines the discarding of material evidence in the process of digitizing, Vera Brittain’s The Dark Tide. Similarly, Scott Spillman and Julia Mansfield’s presentation, “Mapping Eighteenth-Century Intellectual Networks”, discusses their work on Benjamin Franklin’s letters and their relationship within the Republic of Letters. The conference’s purpose also addresses many of the questions we have been posing on this blog:

■ How is digital technology changing methods of scholarly research with pre-digital sources in the humanities?
■ If the “medium is the message,” then how does the message change when primary sources are translated into digital media?
■ What kinds of new research opportunities do databases unlock and what do they make obsolete?
■ What is the future of the rare book and manuscript library and its use?
■ What biases are inherent in the widespread use of digitized material? How can we correct for them?
■ Amidst numerous benefits in accessibility, cost, and convenience, what concerns have been overlooked?

Peter Stallybrass is offering the keynote, and Jacqueline Goldsby will be the colloquium speaker, while Willard McCartney, Rolena Adorno, and others will appear on the closing roundtable. Such a lineup points to the range of perspectives represented. The conference is free to all affiliated with a university.

Among the places this conference has been announced is the JISC Digitisation News section of the UK Digitisation Programme website, and its announcement emphasizes the participation of students “from around the globe.”

Collaboration as it occurs across boundaries is the implicit topic of this posting, and I wish to use reports from the JISC website both as a springboard and as a contrast in the discussing the topic.

A 2008-2009 JISC report, Enriching Digital Resources 2008-2009, Enriching Digital Content program—a strand of the JISC Online Content Program—features a podcast with Ben Showers. Because of the national nature of JISC, the program described offers a unified, coherent approach to advancing digital resources for its higher institutions of education; it represents a collaborative agenda. In this podcast Showers explains the purpose of the program: Rather than fund the creation of new resources, the program invested £1.8 million to enhance and enrich existing digital content while also developing a system for universities and colleges to vet and recognize this work. He then turns to explaining the following four key benefits of this program:
• “unlocking the hidden—making things that are hard to access easy” to obtain and preserve. To illustrate, he uses CORRAL (UK Colonial Registers and Royal Navy Logbooks) project as an example of opening up primary data to make it not only much more available but also to preserve it.
• enhancing experiences of students. Here Showers exemplifies the Enlightening Science project at Sussex that offers students opportunities to watch video re-enactments of Newton’s experiments and read original texts by Newton and others.
• speeding up research—once a document has been digitized, there is no need to repeat the process. The document will now be available for all other researchers to use.
• widening participation—engaging broader audiences including not only faculty and students within Britain’s educational community but also participants globally.

Turning to the new goals for the 2009-2011 program cycle, Showers notes an emphasis on the “clustering” of content, that is bringing various projects together and establishing, when appropriate, links among them. Another focus is further building skills and strategies within institutions to deliver digital content effectively. Finally, he mentions the strengthening of transatlantic partnerships, and here the US National Endowment for the Humanities (NEH) is given as an example. Of course, there is a long history of scholarly collaboration between the NEH and British institutions—perhaps most notably the English Short Title Catalogue (ESTC).

Indeed, through collaborative digital grants offered by JISC and NEH several transatlantic projects are underway or near completion, including the Shakespeare Quartos Archive, a collaborative effort involving Oxford University and the Folger Library, and the St Kitts-Nevis Digital Archaeology Initiative, undertaken by Southampton University and the Thomas Jefferson Foundation, Charlottesville, VA, to advance scholarship on slavery. There are several others as well.

Both the goals and benefits detailed by Showers are ones that would attract the support of diverse parties, and they do parallel many arguments being made on this side of the Atlantic for such work, including ones advanced by the NEH. Moreover, this and other JISC reports suggest that JISC has also helped broker mutually beneficial relationships between British universities and commercial vendors such as Cengage-Gale and ProQuest. Yet another JISC report, The Value of Money, offers arguments that we need to be making and also points the obstacles and divides affecting various types of collaboration in the United States.

After offering the following figures on the return of money invested in the JISC,

• For each £1 spent by JISC on the provision of e-resources, the return to the community in value of time saved in information gathering is at least £18.

• For every £1 of the JISC services budget, the education and research community receives £9 of demonstrable value.

• For every £1 JISC spent on securing national agreements for e-resources, the saving to the community was more than £26.

the report summary offers the following remarks:

These are the figures revealed by a recently-published Value for Money report on JISC services. Although many countries have centrally provided research and education networks, and some have provided supplementary services, no other country has a comparable single body providing an integrated range of network services, content services, advice, support and development programmes.

The cost-effectiveness of JISC is again highlighted in two sidebars:

These figures suggest that for every £1 JISC spent on securing national agreements for e-resources, the saving to the community was more than £26
The added value, equivalent to more than £156m per year, suggests the community is gaining 1.4 million person/days, by using e-resources rather than paper-based information.

The end of the summary further reinforces why investments in JISC benefit the UK as a whole:

The value of JISC activities extends beyond the benefits identified here. Education and research are high-value commodities that play an important role in the UK economy and underpin the UK’s global economic position.

The JISC’s “Value of Money” report contains the types of arguments and data that we in the US need to be making. While our system of higher education does not operate under the centralized system that characterizes that of the UK, the push for more transparent reporting on and assessment of what our various universities and colleges are delivering perhaps provides an opportunity for new forms of collaboration. Through national scholarly societies, the NEH, Mellon Foundation, ALA, and more, we need to supply some “noisy feedback” from a dollars-and-cents/sense perspective about what investing in digital resources means not just for our institutions of higher learning but also for our society.


