Archive for the ‘Full-text searching’ Category

Folger Digital Texts Now Online (and Other March Announcements)

March 15, 2014

This month has already seen a number of news items of potential interest to EMOB readers including Gale-Cengage’s announcement that will it offer STEM e-books from Springer and Elsevier (a potentially potent nexus of publishing forces in the subscription database world) as part of its Gale Virtual Reference Library (GVRL) and that it is launching a Proprietary Monograph Publishing Program; free access in March to Orlando: Women’s Writing Online that Anna announced here a few days ago; and a note from Dr. Ian Christie-Miller about digital imaging resources he has been developing and the interest it has received in the UK.

Just this week the Folger announced that all 38 of its digital texts of Shakespeare’s plays are now available, free of charge, online. As the homepage’s title Timeless Texts, Cutting-Edge Code suggests, a key feature of these texts is the robust coding that one can freely download. Besides the meticulously executed TEI-compliant XML structure of these plays, the texts are also attractively designed for reading as this opening of All’s Well That Ends Well illustrates. This page also displays the useful digital paratexts accompanying each work. Barbara Mowat and Paul Werstine offer a brief Textual Introduction to the site.

We would like to hear from others about how they are using this new resource–both in terms of its texts and the source code.

Advertisement

British Newspaper Archive: Not Burney (yet), But Still Useful

July 6, 2012

Launched this past November, the British Newspaper Archives is a joint project of the British Library and brightsolid online publishing. Over the next decade, this partnership is slated to digitize over 40 million pages of the BL’s newspaper collection. The site anticipates a wide audience that includes not only scholars but amateur historians, genealogists and more.

While the project often digitizes the original paper copies, it has also digitized from the BL’s microfilm copies because the process is faster and enables more pages to be made available in a shorter amount of time. The quality of the pages, however, does suffer as the website admits; unfortunately, this emphasis on speed means that the accuracy of the search results is forever sacrificed. That said, one can view the OCR text and correct it:

When viewing an image, the OCR text can be viewed via the left nav All Articles option. You can select an individual article and then select Show Article text and the text. This addictive option can be accessed by simply clicking the list of sections displayed and applying your own corrections. By correcting the text, you will be adding to the quality of the data that can be searched by others. Please note that during the launch period updates to corrections will take longer to appear. (“Getting Started”)

The site’s descriptive information suggests that the collection dates primarily from the nineteenth century on, but there are 24 eighteenth-century provincial newspaper titles available in the current collection (full list appears below). As of yet, there are no eighteenth-century London papers. Like Burney, the British Newspaper Archives is a subscription database. Unlike Burney, though, provisions for individual subscriptions exist. The rates also seem quite reasonable and offer an array of plans (credits refer to the number of views; each view “costs” 5 credits; the view option enables you to download or printing):

  • 12 Month Package (unlimited pages)
    Price: £79.95 GBP,       Valid For: 365 days       Credits: Unlimited*
  • 30 Day Package (up to 600 pages)
    Price: £29.95 GBP       Valid For: 30 days      Credits: 3000
  • 7 Day Package (up to 120 pages)
    Price: £9.95 GBP       Valid For: 7 days       Credits: 600
  • 2 Day Package (up to 100 pages)
    Price: £6.95 GBP       Valid For: 2 days      Credits: 500
  • Potential users are also able to register using an email address and receive 15 free credits—-a very limited trial of sorts.

    Searches can be conducted either as simple or advanced. The advanced search includes searching by “All of these words,” “Any of these words,” “Without these words,” and “Phrase.” You also have the option of applying filters such as dates, place of publication, publication title, or article type (advertisement, article, family notice, illustrated, miscellaneous). There is also the option of browsing by titles. Unfortunately, you cannot use the wildcard characters to help counteract the poor OCR, long “f,” or other typographical peculiarities that the Burney search interface provides. Nor does the BNA offer features similar to Burney’s search aids such as the “w” or “n” joined by a number to find two terms within a certain proximity of one another.

    Search results can be ordered by relevance or by date (either by ascending or descending order), and a glimpse of the context in which the search term results occur are given. For example,

    Ipswich Journal
    Sat 10 Jan 1784 Suffolk, England
    5 U F F O L K. 1 0 be Ll’. TT, ant! enlered upon immediatrly, THAT olil*aeculbmed Public Honic,
    8343 Words
    “SfMON PATERNOSTER of Wickhairi-market, to be agent for the faitl company for the town of Wiek- ham.market, and parts adjacent. The company infure lioufeS, bufrdings, … ?

    As this example demonstrates, the context provides the OCR text with all its warts. Still, it helps the user decide if the article is worth viewing and assists in conserving the credits in one’s account.

    CAVEAT: During my two-day exploration of the BNA, I encountered several cases in which I clicked to view an article only to discover the article was not on that page. Five credits were still deducted from my account and continued to be deducted as I browsed other pages in the issue. Once or twice I was not able to find the result at all; other times it appeared on a different page within that issue.

    Here is a list of eighteenth-century titles currently available in BNA:

    • Aberdeen Journal (105)
    • Bath Chronicle and Weekly Gazette (1989)
    • Birmingham Gazette (8)
    • Bristol Mercury (1)
    • Caledonian Mercury (7309)
    • Chelmsford Chronicle (329)
    • Derby Mercury (2595)
    • Hampshire Chronicle (1115)
    • Hampshire Telegraph (11)
    • Hereford Journal (982)
    • Ipswich Journal (1203)
    • Ipswich Journal, The (1296)
    • Kentish Gazette (374)
    • Leeds Intelligencer (2352)
    • Manchester Mercury (223)
    • Newcastle Courant (2561)
    • Norfolk Chronicle (1109)
    • Northampton Mercury (1625)
    • Oxford Journal (2434)
    • Reading Mercury (570)
    • Salisbury and Winchester Journal (17)
    • Scots Magazine, The (611)
    • Sherborne Mercury (256)
    • Sussex Advertiser (60)

Digital Humanities and Archives II: ‘Archival Effects’ of Digitization

April 29, 2012

In an earlier EMOB post, “Digital Humanities and the Archives I: Economics and Sustainability”, we discussed the varied connotations that the term “sustainability” evokes. Yet the concept of “archives” also engenders a multiplicity of meanings as does the word “database.” In some circles “archive” and “database” are used interchangeably, while for others the terms signal distinctions between the past and the present. As Marlene Manoff has observed,

When scholars outside library and archival science use the word “archive” or when those outside information technology fields use the word “database,” they almost always mean something broader and more ambiguous than experts in these fields using those same words. The disciplinary boundaries within which these terms have been contained are eroding. Scholars use the terms metaphorically, appropriating them from the professional experts. (Manoff, “Archive and Database as Metaphor: Theorizing the Historical Record.” portal: Libraries and the Academy, 10.4 [2010], 385)

The submissions for the “Digital Humanities and the Archives” roundtable at ASECS 2012 attest to the varied meanings scholars ascribe to “archive” as a digital entity. While some proposals viewed commercial textbases such as ECCO or EEBO as archives, others considered non-commercial digital projects (some of which were designed to perform additional roles beyond being a repository), as falling under the “archival” designation. Still others proposed topics that were not tied to specific digital collections or projects. Reflecting this diversity, the selected presentations featured two papers on the nature of searching within digital environments (Randall Cream, West Chester Univ., and Bill Blake, New York Univ.), another on the coding issues encountered in building a performance history database (Mike Gavin, Rice University; University of South Carolina, Fall 2012), a fourth on the potential evidence that can be derived from negative results (Sayre Greenfield, Univ. of Pittsburgh, Greensburg), and the last on a digital archive aimed at facilitating exchange between scholars facilitating exchange between scholars and those outside the academy (Jessica Richard, Wake Forest Univ.). In his post on the many Digital Humanities sessions at ASECS, Stephen Gregg offers a fine overview of this roundtable, so the following comments supplement his summary. In addition, they serve as a springboard for discussing digitization’s broader “archival effects,” a term coined by Marlene Manoff to “suggest the ways in which digital media bring the past into the present” (386).

Contrasting the old and the new, Randall Cream noted that unlike traditional archives whose contents are not always fully known, digital archives and databases afford more certainty because their creation involves detailed and defining–an encyclopedic naming of their various parts. For Cream, this difference has also meant that searching the digital archives lacks the serendipitous discovery that scholars often experience when working in brick-and-mortar archives. He suggested concept-linked searching as a possible means of fostering chance discoveries within digital environments, a suggestion that provided a fitting segue to Bill Blake’s talk on crafting more effective digital searches. Blake argued for thinking beyond topical keyword searches aimed solely at retrieval. Instead, he called for adopting more quality, conceptually-based searches that will yield better results; such searches will counter the drift and spread that occur when the aim of retrieval replaces the goal of discovery. (Given earlier EMOB discussions of semantic- or meaning-based searches, it should be noted that Blake was referring to the ways users select and fashion search terms and not to the new search platforms that enable semantic or meaning-based searching such as Mimas used in JISC’s Historic Books collection.)

Cream’s and Blake’s remarks point to what could be termed a remediation of research practices as print and digital interact, and both their talks highlighted searching as perhaps one of the most significant reconfigured practices. And indeed the concept of searching has undergone major reformulations in the digital environment. While accessibility and quickness of obtaining results are often seen as digital archives’ main advantage over print, a key benefit of digital collections resides in their enabling users to traverse immense areas of texts multi-directionally. Put another way, what seems radically different about searching in the digital world is not merely unprecedented access and speed, but rather the ways one can alter search strategies instantaneously, shifting not only the search terms employed at a moment’s notice but also the temporal and spatial coordinates in which those terms are placed. This capability expands the ways we are approaching the search as a strategy, opening up new conceptualizations even as we retain the habits and training we acquired working with print. As Wired magazine’s Kevin Kelly has observed: “What search uncovers is not just keywords but also the inherent value of connection…Search opens up creations. …As a song, movie, novel or poem is searched, the potential connections it radiates seep into society in a much deeper way than the simple publication of a duplicated copy ever could” (Kevin Kelly, “Scan this Book!” New York Times, 14 May 2006).

The searching enabled within digital archives reorients our thinking about what constitutes relevant information and exposes the kinds of connectivity that we would likely miss or overlook working with print and manuscript in traditional environments. This reorientation, moreover, possesses its own opportunities for serendipity. While serendipitous discoveries made when working in a traditional archive or even browsing in the stacks typically occur within a bounded space and a pre-selected range of call numbers, digital archives and databases enable virtual movement throughout their holdings to uncover relevant but unforeseen connections not bounded by categories of expectations. In short, capable of serving as far more than text delivery systems and repositories, these digital archives and databases function as “discovery aids.” Fostering a culture of connectivity, these intellectual laboratories of sorts can provide access not only to individual titles but also to a larger, dynamic field of textual and sociocultural activity.

Sayre Greenfield’s paper demonstrated the kind of discoveries that this rethinking of relevant information can yield. Noting that assessing negative findings requires caution, Greenfield explored the ways in which a lack of search results—negative evidence—can translate into meaningful information and concluded that “absences are most useful when measured against positive results found elsewhere, in different genres or different periods.” In offering examples of the different hits obtained from performing the same search in ECCO and Burney, he drew attention to the importance of knowing the scope of a given database and the value of working across databases.

Mike Gavin’s paper also underscored the importance of understanding the operation of digital archives and the rethinking that such understanding can prompt. As Gavin recounted, creating a digital archive of dramatic works that incorporates their performance history has necessitated adapting TEI coding to facilitate searching. While his comments reflect the perspective of those constructing the archive, they also hold significance for users of digital archives. The tagging examples he provided illustrate the significant intellectual labor that goes into the creation of digital databases and archives; encoding a document, after all, is an interpretive practice requiring careful thought and subject expertise. His illustrations are a cogent reminder that the archives–whether traditional or digital–are never neutral but always are rooted in the views and principles of their creators. In the case of digital archives or databases, users benefit from being cognizant of their “constructedness.” Having an awareness of a digital archive’s creators, the circumstances surrounding its creation, the quality of its metadata, and the idiosyncrasies of its search engine will almost certainly enhance a user’s search process and, in some cases, even his or her analysis of results. Unfortunately, it is not always possible to uncover such details about digital archives and databases. Plus, even when there is transparency and one can familiarize oneself with a digital archive’s encoding principles and information architecture, the tagging can still limit the what results searches return. On a different note, it seems worth mentioning that the tasks of coding and organizing the contents of a traditional archive will, in turn, often enrich knowledge of its physical material. And this physical material remains important, for the digital and the material are not one and the same.

Unlike the first four papers that focused on either existing archives or ones nearing completion, Jessica Richard’s paper dealt with the early planning stages of a digital project. The incarnation for the project was a desire to foster exchange between eighteenth-century science studies scholars and a non-academic readership; creating a web-based site seems an ideal medium for the public-humanities thrust of this project. Notwithstanding its differences from the other talks, Richard’s topic very much reflects how the digital is transforming our traditional conceptions of archives. The project’s rethinking of audience, attention to wide access, and desire to translate scholarship for an interested general public all exemplify aspects of this transformation.

As these five talks illustrated, digital media are transforming our theoretical conceptions of “archives”; creating new paradigms and inspiring shifts in existing models as the digital and traditional archival cultures interact; and shaping the kinds of archival projects being undertaken, the methodologies used, and the types of research questions posed. Early in her essay Manoff suggests that “our current moment reflects the convergence of two phenomena–new technical capacities and an age-old impulse to gather and preserve. The ease of capturing digital data is an incitement to archive” (386). In light of the linguistic history of “archive,” connections between new technical capacities and the desire to collect and preserve have perhaps an even longer history. The word “archive” does not appear until after the invention of hand-press printing. While its use as a noun to denote either a historical document that is preserved or the place in which such documents are kept dates from the late 1630s/early 1640s, its verbal form–to archive–does not enter the lexicon until the twentieth century. Whether coincidence or not, this verb does not gain wide currency until the 1980s, a timing that corresponds with the growth in the use of computers and related technologies. In the past two decades the extensive adoption of digital technologies has dramatically spurred efforts to assemble large-scale collections of visual, verbal, and even oral materials and make them virtually available, either freely or commercially.

For Manoff, metaphorical appropriations of “archive” are not only useful for theorizing the ever-increasing growth of these collections but also for theorizing the digital in terms of its archival effects on our conceptions of history and the cultural record (385-6). As Manoff observes at the close of her essay, “archive” especially lends itself to such theorizing because the concept “carries within it both the ideal of preserving collective memory and the reality of its impossibility” (396). The musings about traditional and digital archives presented here touch upon only a few of the archival effects that digital transformations are exercising on our research practices and broader relationships with the history and knowledge. I hope others will add their thoughts about these changes and the explanatory power of “archive” to address our cultural moment.

JISC’s Historic Books: Searching EEBO, ECCO for meaning

March 6, 2012

This past fall JISC announced a new venture, the JISC eCollections, “a new community-owned content service for UK HE and FE institutions.” What might interest EMOB readers most is its Historic Books. This digital collection contains over 300,000 books from before 1800 and also makes over 65,000 19th-century first editions from the British Library available for the first time online. The entire corpus is accessible through institutional subscription and, most welcome, searchable over a single platform.

The pre-1800 material in the JISC Historic Books eCollection consists solely of ProQuest’s Early English Books Online (EEBO) and Gale’s Eighteenth Century Collections Online (ECCO) textbases, so some might wonder what this collection offers that is new for those working in the early modern period. One does not need to be in eCollections, for instance, to conduct searches simultaneously across both databases. Yet the Help page for the eCollections indicates that more than just the convenience of a single interface and platform is being offered:

JISC Historic Books uses meaning-based searching rather than traditional keyword searching, which is why you will notice you get different results to searching EEBO and ECCO on the publishers sites. Meaning-based searching enables you to find conceptual and contexual [sic] links betweeen [sic] related documents which aren’t possible using traditional keyword searching.

Besides returning traditional results, JISC Historic Books also delivers “meaning-based” concepts deemed relevant to the search in the form of a Concept Cloud:

Concept Cloud

The more prominent the word, the more relevant it is deemed to the search, and as the screenshot indicates, items in the cloud can be manipulated to narrow one’s search further.

Over the past three or four years (and maybe longer) I have been consistently struck by the transformations that traditional searches of ECCO, Burney, EEBO, as well as Google Books have had on the ways I think about searching, construct searches, and view my results. More specifically, these keyword searches, described here as traditional, were already encouraging me to view results in a more networked, contextual way and, as a consequence, to devise additional searches aimed at teasing out new potential relationships. The meaning-based search enabled by JISC’s mimas platform, of course, is offering something quite different, but I wonder how its use might cause rethinking of what it means to search and research.

It would be interesting to hear from EEBO and EECO users in the UK who have used JISC Historic Books, especially the differences between results obtained from searching using the JISC platform and those obtained by searching using the original publishers’ platform.

 

Google Books Award: ESTC Receives Digital Humanities Grant

July 21, 2010

Posted on behalf of Brian Geiger, University of California, Riverside.

Brian reports:

I’m pleased to announce that Ben Pauley and I have received one of twelve inaugural Google Digital Humanities grants to match pre-1801 items in Google Books to the ESTC. The official announcement was made last week. You can read more about the grant at Inside HigherEd.

Our plan is to match as much as we can through computer matching, putting urls for Google Books in appropriate ESTC records and providing Google with ESTC ids and metadata. We don’t know for sure, but estimate that there will be between 100,000 and 200,000 ESTC-related items in Google Books. Based on matching that the Center for Bibliographical Studies and Research (CBSR) has done of records from electronic library catalogs, we should be able to computer match up to 50% of the Google records. This number could be lower than usual, however, given the truncated nature of much of the Google metadata.

The remaining 50% or so of the records we hope to put in a version of Ben’s Eighteenth-Century Book Tracker and make publicly accessible for users to help with the matching. For those of you teaching bibliography or bibliographically-minded courses next year, this could be a wonderful teaching tool, allowing your students to struggle with the complexities of early modern bibliography and learn first-hand its importance for understanding the history of the book.

We’ll update this blog about our progress with the Google Books metadata and hope to have a version of the Eighteenth-Century Book Tracker ready for use by the end of the fall or early spring.

Collaborative Reading: “The Joys, Possibilities, and Perils of the British Library’s Digital Burney Newspapers Collection”

May 13, 2010

Ashley Marshall and Rob Hume, “The Joys, Possibilities, and Perils of the British Library’s Digital Burney Newspapers Collection.” PBSA, 104:1 (2010): 5-52.

At forty-seven pages Ashley Marshall and Rob Hume’s article offers a substantive assessment of this relatively recent electronic resource for early modern studies. Early on the authors argue that “[d]igital Burney is amazing, but exploiting it fully is going to demand some serious rethinking and reorientation in both our research and our teaching (6-7). Their claim that this tool “will change the way we conduct our business” (7) possesses much merit; fulfilling digital Burney’s promise, however, will depend on far broader scholarly access than currently exists. Equally important, scholars need to acquire a firm understanding of its possible uses, search capabilities, and limitations. While Marshall and Hume’s piece cannot assist in matters of accessibility (though it could serve as support for the tool’s purchase), their essay does advance our knowledge of how this tool might be employed and how its features and limitations can best be navigated.

The article is usefully divided into five sections. The first considers the difficulties surrounding the use of newspapers for literary research. The next two parts detail various scholarly and pedagogical uses of newspapers afforded by digital Burney. The fourth section, making up nineteen of the article’s total pages and accompanied by five reproduced screen shots, identifies the external and internal shortcomings of the resource. The final part offers conclusions.

I. Conceptual Barriers to the Utilization of Newspapers

Noting that newspapers make a rare appearance in scholarship and teaching, this section examines the basis for such neglect.

  • A key reason stems from the simple fact that newspapers were virtually unavailable in the US until 1978 when the Early English Newspapers microfilm series made its debut. Even then, however, the series did little to bolster the already scant interest in historical newspapers among scholars. (7)
  • The reign of New Criticism and the subsequent heyday of Theory strongly discouraged the use of material drawn from newspaper content. If newspapers were consulted, the information sought was typically confined to obituaries, book and play reviews, and advertisements for books and cultural performances. (8)
  • That early newspapers either lack organized sections, including headlines, or feature very basic divisions often prove initially daunting to users. Especially in papers published before the 1760s, the lack of source information, the unacknowledged lifting and repetition of content across titles, sparseness of details, and partisan leanings also have made these newspapers seem strange and have done little to encourage their use (8-9).
  • Often scholars do not possess the knowledge needed to extract and draw conclusions about the values contained in many of these papers. Scant information about the circulation and readership of newspapers hinders a scholar’s ability to “analyze their implied readership, ideology, or socio-political agendas” (10). A broad gap exists between the literature we study and teach and the information found in these newspapers (11).
  • II. Research Uses

    The authors supply three extended examples of possible ways that digital Burney can assist researchers.

  • Book Prices: Newspaper advertisements afford us a rich opportunity to compile prices for books not otherwise available (11-12). To illustrate, the authors supply prices derived from digital Burney for satire and then offer various insights this list affords. For one, the list reveals that prices for this genre ranged widely from low to high; the affordability and greater number of lower priced titles intimate that “[t]hese works were intended to reach and influence readers” (16). Additional examples of the price information newspapers can offer include

    Collected works were considerably more expensive to buy than if one purchased the individual titles when initially published.

    Newspapers “can turn up major fluctuations in price over time” for a given title(16).

    Information in newspapers can enable us to reconstruct marketing strategies; for example, some advertisements reveal attempts to reach multiple markets by offering several formats at different prices (16-17).

    As the authors assert, knowledge about book prices matters because “[i]f we are going to understand the works we study and the world in which they were produced and read, then the clearer we can be on price and what it implies about audience, the better” (17).

  • Reception and Reputation: Noting that dissemination contributes to our understanding of the reception and reputation of writers and their works, Marshall and Hume also caution that information drawn from digital Burney searches for prices, reprintings, marketing strategies, commentary or allusions to authors, and the like has its limitations. For one, newspapers until the late eighteenth century offer little in the way of cultural commentary; second, searching for authors’ name can be problematic for numerous reasons ranging from false hits (e.g., “Pope” yields a huge number of results, but many do not refer to the author) to problems with OCR failing to return anywhere near the actual number (18). Still, such searches can provide interesting information and, in turn, questions about the rise and diminishing of an author’s visibility in the papers, the geographic parameters of that visibility, and the contemporary existence of associations or groupings of authors (19-20).
  • Study of Individuals: The Case of John Rich: In this example the authors illustrate ways in which Burney can augment and shift our understanding of understudied individuals through an examination of theatre owner and manager, John Rich. In addition to discussing how Burney yielded fresh information about Rich, Marshall and Hume also discuss briefly the specific, various searches performed to yield hits for John Rich; they close this case study with a cautionary example of how newspapers, while often providing new facts and leads, can also on occasion provide false or erroneous information.

    III. Teaching Uses

    The authors divide their discussion of how digital Burney might be used in the classroom into two sections, one dealing with eighteenth-century economics and the other with the century’s Weltanschauung. Marshall and Hume preface their two pedagogical uses with a warning that students will need much prior preparation before attempting to use the resource. This preparation includes not only assistance with the intricacies and peculiarities of searching digital Burney but also with working with historical primary sources, especially sources as newspapers (24).

  • Economic Issues and the Value of Money: While the research section focused on book prices and dissemination, here the focus is broadened to using Burney to show “students … how things looked to eighteenth-century people” in terms of money–“a much neglected subject” (24). While we can simply tell students today’s monetary equivalents for sums of money mentioned in eighteenth-century literary works, the authors make the salient point that “hearing is not the same as comprehending” (26). What the authors recommend is having students search the prices of everyday items found in newspaper advertisements and calculate their modern monetary equivalents. As they note, their findings can radically shift our understanding about the economic references found in the literature being study and, in turn, carry implications that extend beyond the works.
  • Seeing the World through Eighteenth-Century Eyes: Near the end of this section, Marshall and Hume underscore that what they have been proposing means fundamentally “altering the way we teach” rather than merely supplementing our current methods (30). The crux of this shift entails replacing secondary with primary sources as the means by which students learn to “see[ ] the world through eighteenth-century eyes.” Among the suggested assignments is a rhetorical or ideological critique of a newspaper title during a set time or a comparative variation in which several titles are examined (27). Using ECCO as well as Burney, another possible assignment would have students explore an event or topical reference; commentary on Dr. Sacheverell’s trial, the 1745 Jacobite invasion, the 1730 trial of Colonel Francis Charteris for rape, the American war (as opposed to “Revolution”), or reviews of theatre performances represent just a few of the examples they offer (27-29). Yet another use involves investigating the reception of works based on newspaper commentary (29). Noting that the nature of the course—a survey will differ considerably from an honors seminar—will affect the assignment(s) used, the authors stress that the benefits of such exercises is not enhancing the interpretation of specific works but rather in “helping bring the works we study to life, in making real to twenty-first-century undergraduates the commitments and passions of eighteenth-century writers and readers” (29).

    IV. External and Internal Problems

    Before addressing particular kinds of problems, Marshall and Hume review the basic and advance search capabilities of digital Burney. As the authors rightly note, these two search types will already be familiar to ECCO users. Proximity searches–searches in which one uses a “W” to find occurrences of a term that follows another within a certain number of words (e.g., “Hogg w5 Giltspur” will uncover Hogg within five words of “Giltspur”) or an “N” to find occurrences of a term preceded or followed by another (e.g., “Hogg N20 Giltspur” will return cases of Hogg appearing either before or after “Giltspur” within twenty words of each other)–can be done using either the basic or advanced search. Both kinds of searches can be limited by date and publication titles; both handle wildcard searches (! represents either a blank or any single character; * represents multiple characters, and ? represents any single character); and both accommodate “fuzzy” searches (31-34). This discussion offers even more detailed advice, including remarks about potential outcomes from various search methods.

  • The first set of problems falls under the rubric “External Issues.” While issues such as incomplete runs have emerged in previous emob discussions and the EC/ASECS and ASECS round-tables on these research tools, the approach taken here differs in some respects from points raised in these forums. In addition to incomplete runs (the authors are rightfully thankful for their inclusion and also offer suggestions for locating copies not in the collection), Marshall and Hume discuss the difficulties encountered when searching for material referenced in published works due to the high error rates of citations for eighteenth-century newspapers (35-36). In doing so they also suggest ways to navigate these false citations.
  • Spread-Date Papers and Other Problems with the Documentation and Search Results:
    A serious problem with the disastrous potential for being reproduced exponentially involves the dates digital Burney currently provides for individual issues of titles not published daily. For newspapers published weekly or twice or three times a week,

    [i]f the search engine is used to go directly to a news item or advertisement, the only date the user will see is the wrong one. The correct one has to be found by taking a multi-click detour to bring up the first page of the issue and then resize it to read the printed date on the original paper–ifthe user realizes that this may be a spread-date [a title whose issues each cover a spread of days between publications] newspaper and knows to check. [Footnote 50 indicates that Gale is in the process of rectifying this problem; “Scott Dawson of Gale informs us that they have identified some 70,000 instances of the problem” as of July 2009 (my emphasis)]. (37)

    Duplication is yet another problem and comes in several forms. The Burney collection contains duplicate copies of a given issue as well as duplicate runs of a given title, which at times will result in the appearance of more hits than actually occur (37-38). Another kind of “duplication” results from the habit of newspapers publishing copy identical to that found in other papers (38).

    Acknowledging the problems stemming from OCR technology and the erratic search results these problems generate, Marshall and Hume briefly mention some of the issues already raised in previous emob postings. In terms of false negatives, they usefully remind us of the role played by the Burney search engine’s design. For example, if one’s search term appears across two pages, then that occurrence will be omitted from the results (41). Citing Jim May’s recent article, “Accessing the Inclusiveness of Searches in the Online Burney Newspapers Collection” (The Eighteenth-Century Intelligencer N.S. 23:2 [May 2009]: 28-34), the authors ruefully report that their experiences with search results correspond to May’s claim “that anything from 20 to 50 percent (or more) of what can be found by manually eyeballing the full texts of newspapers will not show up in the list of results” (41).

    Marshall and Hume offer three, serious cases of false negatives, most stemming from the poor condition of the original. Yet, they close this discussion with an example of “a dire problem in Burney’s presentation of Steele’s Tatler (1709-1711)” that arise from problems with the source material made available to Gale (42). In this case, “the first nine months’ worth of one of the foremost early eighteenth-century English periodicals has functionally been erased” because the source used mixed original Tatler issues with the front matter and other material from later book reprints (43-44). Rather than appear in digital Burney under the title “Tatler,” these pre-1710 issues instead appear under the title Lucubrations of Isaac Bickerstaff. While the authors note that this problem could be lessened via “simple relabeling and cross-referencing” (44), the problem also underscores the importance of hands-on scholarly involvement in the preparation and execution of such digitization projects.

  • Some Interface Issues: Under this heading the authors detail “nine of our pet peeves” with the current interface (44).

    1. While one can search or view results according to particular categories of publication such as “Classified Ads” or “Commercial News,” these sections are fairly meaningless, and an advertisement can easily appear under news or vice versa (44).

    2. The inability to perform case sensitive searches (45).

    3. The inability to control the elimination of “stop” words such as “the,” “a,” or “be” when one is seeking hits for a specific phrase or string of words (45).

    4. The numerous clicks one must endure to confirm the paper, date, day; the best solution to this problem would be for Gale to offer the title and spread date on each and every display page (45).

    5. Related to (4), “that title and date would appear with whatever one printed from page to page.” As the authors note, the need to record manually this information on printed copy of a given page encourages the occurrence of errors, many of which will be multiplied as erroneous citations in future publications (45).

    6. The Browse Publication Title inefficiently results in “a set of links to what are reported as “[X number of] issues” chopped into [X–often in the thousands] chunks of News Advertisements, Business News, etc.” and consequently requires the user to guess where “the desired date might fall.” While using the “Publication Search” is a better approach, this search is not without its problems (46).

    7. The inability to search efficiently for “Other papers for the same date.” Currently, without such a dedicated search feature for this option, one must conduct an “Advanced Search” using “Publication Date”; if multiple dates are sought, one must repeat the process for each date desired (47).

    8. The confusion between the “Previous/Next Article” (“article” here is a misnomer) and “Previous/Next Page”; the first navigates results found, while the second, which appears directly above the newspaper’s text, will take the user to the next page in the issue being viewed (47).

    9. Although one has three options of searching for particular issues of a given title, the three processes differ in their operations, primarily in whether they accept or not the inclusion of an opening article (“the”) in a newspaper’s title (47, 49).

  • Following the “pet peeves” list, the authors offer useful information and advice about the intricacies in printing one’s results. Such information is particular valuable, for as the authors also note, digital Burney’s “printing facility is neither self-evident nor at present particularly well explained” (50). Especially vexing is the failure of several print options to include title and date details.

    V. Observations and Conclusions

    Admitting that hindsight makes for easy criticism, Marshall and Hume nonetheless correctly claim that many of the problems identified in Burney might have been avoided if scholars with appropriate expertise had been closely consulted in the preparatory stages of this significant tool (50). Similarly, if the interface and search features had been tested by actual, potential users, many of the snags in searching might have been eliminated in advance of the tool’s official release. They also draw attention to the commercial nature of the enterprise. Although they do not mention affordable access here or elsewhere, they do stress the high expense and the subsequent expectation among purchasers that “when significant problems emerge … they need to be seriously addressed” (51). The efforts underway to correct the dating errors in spread-date newspapers is no doubt an example of a serious problem that is receiving attention.

    Despite existing problems Marshall and Hume celebrate the wondrous possibilities that digital Burney does afford. While they clearly view research and scholarship as the realms in which digital Burney’s transformative effects will first be felt, they also reiterate the radical alterations it will eventually bring to teaching and classroom practices (52).

    ASECS Session: “ECCO, EEBO, and the Burney Collection: Some “Noisy Feedback” (roundtable)

    March 13, 2010

    Thursday, March 18,  9:45 – 11:15 a.m.

    “ECCO, EEBO, and the Burney Collection: Some ‘Noisy Feedback’(Roundtable)    Alvarado E

    Chair:    Anna BATTIGELLI, State University of New York, Plattsburgh

    1.    Sayre GREENFIELD, University of Pittsburgh, Greensburg

    2.    Stephen KARIAN, Marquette University

    3.    James E. MAY, Pennsylvania State University, DuBois

    4.    Eleanor F. SHEVLIN, West Chester University

    5.    Michael F. SUAREZ, S.J., Rare Book School, University of Virginia

    RESPONDENTS: ScottDAWSON,Gale/Cengage; Brian GEIGER, ESTC: Jo-Anne HOGAN, Proquest

    Collaborative Reading: Elizabeth Scott-Baumann and Ben Burton’s “Encoding form: A proposed database of poetic form”

    March 8, 2010

    Elizabeth Scott-Baumann and Ben Burton’s recent paper,“Encoding form: A proposed database of poetic form”, for APPOSITIONS:
    Studies in Renaissance / Early Modern Literature and Culture
    ‘s recent E-Conference: February-March, 2010, is suggestive of how new digital resources can be developed to augment the capabilities of existing tools such as EEBO and EECO. Responding many years later to Heather Dubrow’s 1979 call for “new methodology in early modern studies,” Scott-Baumann and Burton are constructing a database devoted to poetic form. Their project will afford a means of studying, historically and formally, poetic form by enabling queries about poetic form and generic transformations that resemble those we can now pose about words, thanks to electronic databases such as EEBO and EECO:

  • What is the origin (or origins) of a given form?
  • How does its structure, use, and meaning change over time?
  • Are there variations in use and meaning in different regions, or among different groups?
  • How does a given form relate to others, and how does this relationship change over time?
  • Concentrating on sixteenth- and seventeenth-century poetry, Scott-Baumann and Burton will use existing EEBO-TCP texts and enhance them with additional mark-up that builds upon Text Encoding Initiative (TEI) tags. As those familiar with TEI documentation will recall, its tags include ones designed for encoding verse: “stanza divisions, caesurae, enjambment, rhyme scheme, and metrical information, as well as a special purpose rhyme element to support the simple analysis of rhyming words.” Because encoding capabilities extend beyond merely marking general formal conventions and can also entail encoding that represent interpretive judgments, Scott-Baumann and Burton will experiment with both possibilities. The inevitably time-consuming nature of their task will probably result in building the databases in stages.

    As for publication plans for the database, its creators “aim to negotiate with EEBO and Chadwyck-Healey to find a form of publication which both respects intellectual property and commercial interests, while also making this rich new material accessible to the widest possible audience.” Scott-Baumann and Burton have clearly thought hard about issues of access and how to maximize this database’s availability for users. They present four different possible options, formulated with an eye to those lacking access to EEBO. As they note though, much will depend on what arrangements they are able to make with EEBO/Chadwyck-Healey.

    Noting that their database, once built, could be expanded beyond its present focus on the 1500s and 1600s to cover all periods of poetry, they then devote a section of their paper to its potential scholarly and pedagogical uses. Most obvious perhaps is the usefulness this planned tool could have on advancing work in historical formalism, an emerging approach that revisits “poetic form as historically specific, historically determined, and historically efficacious.” The ability to conduct specific searches across a significant number of poetic texts enables the quick capture of evidence to support or disprove what are currently only hypothetical propositions based on a small textual sample. Rightly claiming that this database “would change the way in which scholarship on poetic form is conducted, Scott-Baumann and Burton detail a wealth of possible questions and issues it could serve. This section also offers a range of pedagogical uses for this tool and addresses a range of audiences from the undergraduate to the secondary student.

    Before a brief conclusion, the paper then turns to discussing the two-stage pilot project for the database:

    1. A small database containing information on the metrical structures and rhyme schemes of all verse in the first edition of 10 texts published between 1590 and 1599. 2. A larger database containing information on the metrical structures and rhyme schemes of all verse in first editions of texts published during this period.

    Scott-Baumann and Burton’s database plans present another way of thinking about EEBO and how to augment its value. That they have proposed to build their database using EEBO-TCP seems essentially a wise plan, notwithstanding unsettled questions about access.* For one, linking one’s project to an already well-established resource should ensure its visibility. Too often very worthy projects are launched but remain unknown to many who would benefit from them. In addition, such a tie-in helps ensure continuity among resources. This augmentation of EEBO’s capabilities and the efforts to provide continuity are similar to what NINES and 18thConnect are offering later periods.

    *One of the access options does offer “[o]pen access to database and texts but not with mark up. …if we are not able to make the XML-encoded texts freely available, we would display the texts in their entirety [as users request them], but with the encoding invisible. … and display the verse with, for example, its stresses marked with accents, or its rhyme scheme colour-coded, rather than with visible tags.”

    Digital Humanities at AHA

    January 12, 2010

    In an earlier post we covered MLA panels devoted to digital humanities, electronic archives, and electronic tools. Thus, although the American Historical Association annual meeting has already recently concluded, we still thought it would be useful to review the sessions held at this convention. When available, I have included links to papers or abstracts.

    Humanities in the Digital Age, Part 1: Humanities in the Digital Age, Part 1: Digital Poster Session
    This session will provide participants with an overview of different digital tools and services and how historians are using them for research, teaching, and collaboration. After brief introductions to the various posters, participants would walk around the room spending time at the various stations, talking with the presenters and other participants. This will be followed in the afternoon by a hands-on workshop (session 73) where participants can learn more about how to use these specific tools. Co-sponsored by the National History Education Clearinghouse (NHEC):

  • Blogging, Jeremy Boggs, Center for History and New Media, George Mason University
  • Text Mining, Daniel J. Cohen, Center for History and New Media, George Mason University
  • Student Projects/Websites and Omeka, Jeffrey McClurken, University of Mary Washington
  • Zotero, Trevor Owens, Center for History and New Media, George Mason University
  • Teaching Tools, Kelly Schrum, Center for History and New Media, George Mason University
  • Web 2.0 – Flickr, YouTube/Video, Google Maps, Wikis, Jim Groom, University of Mary Washington
  • (more…)

    Hot Off the Press! The Eighteenth-Century Intelligencer’s Special Topics Issue: “Teaching with ECCO”

    October 2, 2009

    James May has generously forwarded a copy of the recent Eighteenth-Century Intelligencer, a special topics issue devoted in part to “Teaching with ECCO.”  It contains excellent essays by Nancy Mace, Eleanor Shevlin, Sayre Greenfield, and Brian Glover on how ECCO enriches the classroom.  As Linda Troost explains in a brief but useful introduction, the essays both “offer ideas and provide warnings.”   Access to this issue should contribute richly to our discussions of classroom uses of ECCO.  To read the issue in its entirety, click ECI_F09[1][1].