Archive for the ‘Keyword search’ Category

British Newspaper Archive: Not Burney (yet), But Still Useful

July 6, 2012

Launched this past November, the British Newspaper Archives is a joint project of the British Library and brightsolid online publishing. Over the next decade, this partnership is slated to digitize over 40 million pages of the BL’s newspaper collection. The site anticipates a wide audience that includes not only scholars but amateur historians, genealogists and more.

While the project often digitizes the original paper copies, it has also digitized from the BL’s microfilm copies because the process is faster and enables more pages to be made available in a shorter amount of time. The quality of the pages, however, does suffer as the website admits; unfortunately, this emphasis on speed means that the accuracy of the search results is forever sacrificed. That said, one can view the OCR text and correct it:

When viewing an image, the OCR text can be viewed via the left nav All Articles option. You can select an individual article and then select Show Article text and the text. This addictive option can be accessed by simply clicking the list of sections displayed and applying your own corrections. By correcting the text, you will be adding to the quality of the data that can be searched by others. Please note that during the launch period updates to corrections will take longer to appear. (“Getting Started”)

The site’s descriptive information suggests that the collection dates primarily from the nineteenth century on, but there are 24 eighteenth-century provincial newspaper titles available in the current collection (full list appears below). As of yet, there are no eighteenth-century London papers. Like Burney, the British Newspaper Archives is a subscription database. Unlike Burney, though, provisions for individual subscriptions exist. The rates also seem quite reasonable and offer an array of plans (credits refer to the number of views; each view “costs” 5 credits; the view option enables you to download or printing):

  • 12 Month Package (unlimited pages)
    Price: £79.95 GBP,       Valid For: 365 days       Credits: Unlimited*
  • 30 Day Package (up to 600 pages)
    Price: £29.95 GBP       Valid For: 30 days      Credits: 3000
  • 7 Day Package (up to 120 pages)
    Price: £9.95 GBP       Valid For: 7 days       Credits: 600
  • 2 Day Package (up to 100 pages)
    Price: £6.95 GBP       Valid For: 2 days      Credits: 500
  • Potential users are also able to register using an email address and receive 15 free credits—-a very limited trial of sorts.

    Searches can be conducted either as simple or advanced. The advanced search includes searching by “All of these words,” “Any of these words,” “Without these words,” and “Phrase.” You also have the option of applying filters such as dates, place of publication, publication title, or article type (advertisement, article, family notice, illustrated, miscellaneous). There is also the option of browsing by titles. Unfortunately, you cannot use the wildcard characters to help counteract the poor OCR, long “f,” or other typographical peculiarities that the Burney search interface provides. Nor does the BNA offer features similar to Burney’s search aids such as the “w” or “n” joined by a number to find two terms within a certain proximity of one another.

    Search results can be ordered by relevance or by date (either by ascending or descending order), and a glimpse of the context in which the search term results occur are given. For example,

    Ipswich Journal
    Sat 10 Jan 1784 Suffolk, England
    5 U F F O L K. 1 0 be Ll’. TT, ant! enlered upon immediatrly, THAT olil*aeculbmed Public Honic,
    8343 Words
    “SfMON PATERNOSTER of Wickhairi-market, to be agent for the faitl company for the town of Wiek- ham.market, and parts adjacent. The company infure lioufeS, bufrdings, … ?

    As this example demonstrates, the context provides the OCR text with all its warts. Still, it helps the user decide if the article is worth viewing and assists in conserving the credits in one’s account.

    CAVEAT: During my two-day exploration of the BNA, I encountered several cases in which I clicked to view an article only to discover the article was not on that page. Five credits were still deducted from my account and continued to be deducted as I browsed other pages in the issue. Once or twice I was not able to find the result at all; other times it appeared on a different page within that issue.

    Here is a list of eighteenth-century titles currently available in BNA:

    • Aberdeen Journal (105)
    • Bath Chronicle and Weekly Gazette (1989)
    • Birmingham Gazette (8)
    • Bristol Mercury (1)
    • Caledonian Mercury (7309)
    • Chelmsford Chronicle (329)
    • Derby Mercury (2595)
    • Hampshire Chronicle (1115)
    • Hampshire Telegraph (11)
    • Hereford Journal (982)
    • Ipswich Journal (1203)
    • Ipswich Journal, The (1296)
    • Kentish Gazette (374)
    • Leeds Intelligencer (2352)
    • Manchester Mercury (223)
    • Newcastle Courant (2561)
    • Norfolk Chronicle (1109)
    • Northampton Mercury (1625)
    • Oxford Journal (2434)
    • Reading Mercury (570)
    • Salisbury and Winchester Journal (17)
    • Scots Magazine, The (611)
    • Sherborne Mercury (256)
    • Sussex Advertiser (60)

    Advertisement

Digital Humanities and Archives II: ‘Archival Effects’ of Digitization

April 29, 2012

In an earlier EMOB post, “Digital Humanities and the Archives I: Economics and Sustainability”, we discussed the varied connotations that the term “sustainability” evokes. Yet the concept of “archives” also engenders a multiplicity of meanings as does the word “database.” In some circles “archive” and “database” are used interchangeably, while for others the terms signal distinctions between the past and the present. As Marlene Manoff has observed,

When scholars outside library and archival science use the word “archive” or when those outside information technology fields use the word “database,” they almost always mean something broader and more ambiguous than experts in these fields using those same words. The disciplinary boundaries within which these terms have been contained are eroding. Scholars use the terms metaphorically, appropriating them from the professional experts. (Manoff, “Archive and Database as Metaphor: Theorizing the Historical Record.” portal: Libraries and the Academy, 10.4 [2010], 385)

The submissions for the “Digital Humanities and the Archives” roundtable at ASECS 2012 attest to the varied meanings scholars ascribe to “archive” as a digital entity. While some proposals viewed commercial textbases such as ECCO or EEBO as archives, others considered non-commercial digital projects (some of which were designed to perform additional roles beyond being a repository), as falling under the “archival” designation. Still others proposed topics that were not tied to specific digital collections or projects. Reflecting this diversity, the selected presentations featured two papers on the nature of searching within digital environments (Randall Cream, West Chester Univ., and Bill Blake, New York Univ.), another on the coding issues encountered in building a performance history database (Mike Gavin, Rice University; University of South Carolina, Fall 2012), a fourth on the potential evidence that can be derived from negative results (Sayre Greenfield, Univ. of Pittsburgh, Greensburg), and the last on a digital archive aimed at facilitating exchange between scholars facilitating exchange between scholars and those outside the academy (Jessica Richard, Wake Forest Univ.). In his post on the many Digital Humanities sessions at ASECS, Stephen Gregg offers a fine overview of this roundtable, so the following comments supplement his summary. In addition, they serve as a springboard for discussing digitization’s broader “archival effects,” a term coined by Marlene Manoff to “suggest the ways in which digital media bring the past into the present” (386).

Contrasting the old and the new, Randall Cream noted that unlike traditional archives whose contents are not always fully known, digital archives and databases afford more certainty because their creation involves detailed and defining–an encyclopedic naming of their various parts. For Cream, this difference has also meant that searching the digital archives lacks the serendipitous discovery that scholars often experience when working in brick-and-mortar archives. He suggested concept-linked searching as a possible means of fostering chance discoveries within digital environments, a suggestion that provided a fitting segue to Bill Blake’s talk on crafting more effective digital searches. Blake argued for thinking beyond topical keyword searches aimed solely at retrieval. Instead, he called for adopting more quality, conceptually-based searches that will yield better results; such searches will counter the drift and spread that occur when the aim of retrieval replaces the goal of discovery. (Given earlier EMOB discussions of semantic- or meaning-based searches, it should be noted that Blake was referring to the ways users select and fashion search terms and not to the new search platforms that enable semantic or meaning-based searching such as Mimas used in JISC’s Historic Books collection.)

Cream’s and Blake’s remarks point to what could be termed a remediation of research practices as print and digital interact, and both their talks highlighted searching as perhaps one of the most significant reconfigured practices. And indeed the concept of searching has undergone major reformulations in the digital environment. While accessibility and quickness of obtaining results are often seen as digital archives’ main advantage over print, a key benefit of digital collections resides in their enabling users to traverse immense areas of texts multi-directionally. Put another way, what seems radically different about searching in the digital world is not merely unprecedented access and speed, but rather the ways one can alter search strategies instantaneously, shifting not only the search terms employed at a moment’s notice but also the temporal and spatial coordinates in which those terms are placed. This capability expands the ways we are approaching the search as a strategy, opening up new conceptualizations even as we retain the habits and training we acquired working with print. As Wired magazine’s Kevin Kelly has observed: “What search uncovers is not just keywords but also the inherent value of connection…Search opens up creations. …As a song, movie, novel or poem is searched, the potential connections it radiates seep into society in a much deeper way than the simple publication of a duplicated copy ever could” (Kevin Kelly, “Scan this Book!” New York Times, 14 May 2006).

The searching enabled within digital archives reorients our thinking about what constitutes relevant information and exposes the kinds of connectivity that we would likely miss or overlook working with print and manuscript in traditional environments. This reorientation, moreover, possesses its own opportunities for serendipity. While serendipitous discoveries made when working in a traditional archive or even browsing in the stacks typically occur within a bounded space and a pre-selected range of call numbers, digital archives and databases enable virtual movement throughout their holdings to uncover relevant but unforeseen connections not bounded by categories of expectations. In short, capable of serving as far more than text delivery systems and repositories, these digital archives and databases function as “discovery aids.” Fostering a culture of connectivity, these intellectual laboratories of sorts can provide access not only to individual titles but also to a larger, dynamic field of textual and sociocultural activity.

Sayre Greenfield’s paper demonstrated the kind of discoveries that this rethinking of relevant information can yield. Noting that assessing negative findings requires caution, Greenfield explored the ways in which a lack of search results—negative evidence—can translate into meaningful information and concluded that “absences are most useful when measured against positive results found elsewhere, in different genres or different periods.” In offering examples of the different hits obtained from performing the same search in ECCO and Burney, he drew attention to the importance of knowing the scope of a given database and the value of working across databases.

Mike Gavin’s paper also underscored the importance of understanding the operation of digital archives and the rethinking that such understanding can prompt. As Gavin recounted, creating a digital archive of dramatic works that incorporates their performance history has necessitated adapting TEI coding to facilitate searching. While his comments reflect the perspective of those constructing the archive, they also hold significance for users of digital archives. The tagging examples he provided illustrate the significant intellectual labor that goes into the creation of digital databases and archives; encoding a document, after all, is an interpretive practice requiring careful thought and subject expertise. His illustrations are a cogent reminder that the archives–whether traditional or digital–are never neutral but always are rooted in the views and principles of their creators. In the case of digital archives or databases, users benefit from being cognizant of their “constructedness.” Having an awareness of a digital archive’s creators, the circumstances surrounding its creation, the quality of its metadata, and the idiosyncrasies of its search engine will almost certainly enhance a user’s search process and, in some cases, even his or her analysis of results. Unfortunately, it is not always possible to uncover such details about digital archives and databases. Plus, even when there is transparency and one can familiarize oneself with a digital archive’s encoding principles and information architecture, the tagging can still limit the what results searches return. On a different note, it seems worth mentioning that the tasks of coding and organizing the contents of a traditional archive will, in turn, often enrich knowledge of its physical material. And this physical material remains important, for the digital and the material are not one and the same.

Unlike the first four papers that focused on either existing archives or ones nearing completion, Jessica Richard’s paper dealt with the early planning stages of a digital project. The incarnation for the project was a desire to foster exchange between eighteenth-century science studies scholars and a non-academic readership; creating a web-based site seems an ideal medium for the public-humanities thrust of this project. Notwithstanding its differences from the other talks, Richard’s topic very much reflects how the digital is transforming our traditional conceptions of archives. The project’s rethinking of audience, attention to wide access, and desire to translate scholarship for an interested general public all exemplify aspects of this transformation.

As these five talks illustrated, digital media are transforming our theoretical conceptions of “archives”; creating new paradigms and inspiring shifts in existing models as the digital and traditional archival cultures interact; and shaping the kinds of archival projects being undertaken, the methodologies used, and the types of research questions posed. Early in her essay Manoff suggests that “our current moment reflects the convergence of two phenomena–new technical capacities and an age-old impulse to gather and preserve. The ease of capturing digital data is an incitement to archive” (386). In light of the linguistic history of “archive,” connections between new technical capacities and the desire to collect and preserve have perhaps an even longer history. The word “archive” does not appear until after the invention of hand-press printing. While its use as a noun to denote either a historical document that is preserved or the place in which such documents are kept dates from the late 1630s/early 1640s, its verbal form–to archive–does not enter the lexicon until the twentieth century. Whether coincidence or not, this verb does not gain wide currency until the 1980s, a timing that corresponds with the growth in the use of computers and related technologies. In the past two decades the extensive adoption of digital technologies has dramatically spurred efforts to assemble large-scale collections of visual, verbal, and even oral materials and make them virtually available, either freely or commercially.

For Manoff, metaphorical appropriations of “archive” are not only useful for theorizing the ever-increasing growth of these collections but also for theorizing the digital in terms of its archival effects on our conceptions of history and the cultural record (385-6). As Manoff observes at the close of her essay, “archive” especially lends itself to such theorizing because the concept “carries within it both the ideal of preserving collective memory and the reality of its impossibility” (396). The musings about traditional and digital archives presented here touch upon only a few of the archival effects that digital transformations are exercising on our research practices and broader relationships with the history and knowledge. I hope others will add their thoughts about these changes and the explanatory power of “archive” to address our cultural moment.

JISC’s Historic Books: Searching EEBO, ECCO for meaning

March 6, 2012

This past fall JISC announced a new venture, the JISC eCollections, “a new community-owned content service for UK HE and FE institutions.” What might interest EMOB readers most is its Historic Books. This digital collection contains over 300,000 books from before 1800 and also makes over 65,000 19th-century first editions from the British Library available for the first time online. The entire corpus is accessible through institutional subscription and, most welcome, searchable over a single platform.

The pre-1800 material in the JISC Historic Books eCollection consists solely of ProQuest’s Early English Books Online (EEBO) and Gale’s Eighteenth Century Collections Online (ECCO) textbases, so some might wonder what this collection offers that is new for those working in the early modern period. One does not need to be in eCollections, for instance, to conduct searches simultaneously across both databases. Yet the Help page for the eCollections indicates that more than just the convenience of a single interface and platform is being offered:

JISC Historic Books uses meaning-based searching rather than traditional keyword searching, which is why you will notice you get different results to searching EEBO and ECCO on the publishers sites. Meaning-based searching enables you to find conceptual and contexual [sic] links betweeen [sic] related documents which aren’t possible using traditional keyword searching.

Besides returning traditional results, JISC Historic Books also delivers “meaning-based” concepts deemed relevant to the search in the form of a Concept Cloud:

Concept Cloud

The more prominent the word, the more relevant it is deemed to the search, and as the screenshot indicates, items in the cloud can be manipulated to narrow one’s search further.

Over the past three or four years (and maybe longer) I have been consistently struck by the transformations that traditional searches of ECCO, Burney, EEBO, as well as Google Books have had on the ways I think about searching, construct searches, and view my results. More specifically, these keyword searches, described here as traditional, were already encouraging me to view results in a more networked, contextual way and, as a consequence, to devise additional searches aimed at teasing out new potential relationships. The meaning-based search enabled by JISC’s mimas platform, of course, is offering something quite different, but I wonder how its use might cause rethinking of what it means to search and research.

It would be interesting to hear from EEBO and EECO users in the UK who have used JISC Historic Books, especially the differences between results obtained from searching using the JISC platform and those obtained by searching using the original publishers’ platform.

 

Does Surfing the Net Change How We Think?

June 17, 2010

It does, claims Nicholas Carr, in The Shallows: What the Internet is Doing to our Brains (Norton, 2010).  Expanding his famous 2008 Atlantic piece, “Is Google Making Us Stupid?,” Carr delivers a measured but disturbing conclusion regarding the effect of long-term internet use: the brain’s cognitive activity is re-routed to skim, rather than to read deeply.

He opens with a confession that may sound familiar, especially to those of us who find increased resistance to deep-reading, either in the classroom or in our own work:

Over the last few years I’ve had an uncomfortable sense that someone, or something, has been tinkering with my brain, remapping the neural circuitry, reprogramming the memory.  My mind isn’t going—so far as I can tell—but it’s changing.  I’m not thinking the way I used to think.  I feel it most strongly when I’m reading.  I used to find it easy to immerse myself in a book or a lengthy article.  My mind would get caught up in the twists of the narrative or the turns of the argument, and I’d spend hours strolling through long stretches of prose.  That’s rarely the case anymore.  Now my concentration starts to drift after a page or two.  I get fidgety, lose the thread, begin looking for something else to do. . . the deep reading that used to come naturally has become a struggle (5-6)

Tellingly, Carr became unwired to write the book: he put his blog, Rough Type, on hold, moved from Boston to Colorado, and limited his social networking, including e-mail.  The Shallows appeared a year and a half later.

The Shallows probes how the Internet transforms cognitive activity by repeatedly “seiz[ing] our attention only to scatter it” (118).  The plasticity of the adult brain means that its circuitry adapts to the repeated pattern of having its attention splintered by competing demands from ads, e-mail, list-serves, hyperlinks, Twitter, Facebook, and the infinite possibilities of a Google search.

The brain’s plasticity can be positive.  For example, the area of the sensory cortext that processes signals from the left hand is larger in right-handed violinists than it is in right-handed non-violinists.  As violinists practice, the stimulation from their left hand physically changes the shape of their brain.  Similarly, victims of brain injury or illness can often use the brain’s adaptability to compensate for injury.  Even the neurons of sea slugs change, both biochemically and anatomically, in response to cognitive stimuli.

But neuroplasticity has a downside, too.  The “repetitive, intensive, interactive, addictive” cognitive stimuli delivered by the internet “have been shown to result in strong and rapid alterations in brain circuits and functions” (116).  As Carr puts in the May 24 issue of Wired,

When we go online, we enter an environment that promotes cursory reading, hurried and distracted thinking, and superficial learning.  Even as the Internet grants us easy access to vast amounts of information, it is turning us into shallower thinkers, literally changing the structure of our brain.

Catering to the brain’s hunger for information and novelty, the internet provides an environment of  “constant distractedness”  (119).   Repeated heavy use of the Net has, as another neurologist, Michael Merzenich, notes, “neurological consequences” (120).  The time we spend on the internet is time away from reading linearly-driven narratives requiring concentration.  The disused neurons and synapses once dedicated to deep reading get recycled into the work of distracted skimming.  Carr concurs with Maryann Wolf’s conclusion that as we read online

we sacrifice the facility that makes deep reading possible.  We revert to being ‘mere decoders of information.’  Our ability to make the rich mental connections that form when we read deeply and without distraction remains largely disengaged (122).

Michael Merzenich puts this more strongly, arguing that internet multitasking may be “deadly” for our intellectual lives (142).

Repeated internet use results in what Carr calls  “The Juggler’s Brain,” the title of a chapter that should be mandatory reading for all teachers.   Study after study is cited demonstrating that multitasking interferes with memory.  A Cornell study reveals that students typing on a laptop during a lecture perform more poorly on tests of the lecture’s content than students without laptops—even when the web pages visited pertain to the material discussed.

Similarly, a Kansas State study notes that students watching a CNN broadcast loaded with color graphics and a “textual news crawl” remembered less about the broadcast’s content than students who were given the the same program stripped of the graphics and news crawl.  This information might be useful the next time someone discusses multimedia teaching techniques.  And when a printed text becomes a hypertext, as it does on Kindle, comprehension is, studies suggest, compromised.

Carr’s most interesting chapter may be on memory.  He distinguishes between “primary memories,” which vanish soon after they come into being, and “secondary memories,” which can be recalled indefinitely.  When a boxer gets knocked out, his recent memories disappear, suggesting that it takes time for a “primary memory” to become a “secondary memory.”

Short-term memories don’t become long-term memories immediately, and the process of their consolidation is delicate. Any disruption, whether a jab to the head or a simple distraction, can sweep the nascent memories from the mind.  (184)

So while surfing the Net can yield a giddy confluence of ideas and connections, few of those connections are actually retained because the Net’s distractions interrupt their migration into secondary memory.  While computer memory is a database, pure and simple, human memory relies on processing information: “Biological memory is alive.  Computer memory is not” (192).

Behind all of this is a concern for deep reading, which is currently dividing the humanities and widely discussed by the media.  Recently, both  Stanley Fish and David Brooks argued for the need for deep reading in the humanities to cultivate wisdom.  A week earlier, the Chronicle of Higher Education ran a story about Stanford University’s Literature Lab, which features data-mining that allows students to “read” 1200 novels for one class.  Clearly, the latter kind of reading differs from the kind called for by Fish and Brooks.

As we consider the new kinds of reading made possible by the internet, including important practical questions, such as Laura Rosenthal’s recent query on The Long Eighteenth regarding how to read a long book in a screen-based medium like ECCO, we would do well to imitate Carr by remaining open to both old and new technologies.  An engaged Net surfer, with a clear understanding of the merits of being plugged in, Carr easily cites poetry and literary anecdotes that reflect his immersion in the world of printed texts.   That he is fully aware of the many advantages the Net offers makes his warnings about the erosion of concentration among internet surfers all the more alarming.  Intrepid internet boosters, like Clay Shirky, downplay the value of printed books as “just a side-effect of living in an environment of impoverished access.”  Shirky famously dismissed Tolstoy’s War and Peace as “too long, and not so interesting” (Carr, 111).  Carr eschews such extreme positions.  Though he put his blog on hold to complete his book, he has now returned to it.  He is not urging that we ban use of the internet.  But he provides extensive evidence for the need to be mindful of the internet’s transformative effects on our cognitive life.

Collaborative Reading: “The Joys, Possibilities, and Perils of the British Library’s Digital Burney Newspapers Collection”

May 13, 2010

Ashley Marshall and Rob Hume, “The Joys, Possibilities, and Perils of the British Library’s Digital Burney Newspapers Collection.” PBSA, 104:1 (2010): 5-52.

At forty-seven pages Ashley Marshall and Rob Hume’s article offers a substantive assessment of this relatively recent electronic resource for early modern studies. Early on the authors argue that “[d]igital Burney is amazing, but exploiting it fully is going to demand some serious rethinking and reorientation in both our research and our teaching (6-7). Their claim that this tool “will change the way we conduct our business” (7) possesses much merit; fulfilling digital Burney’s promise, however, will depend on far broader scholarly access than currently exists. Equally important, scholars need to acquire a firm understanding of its possible uses, search capabilities, and limitations. While Marshall and Hume’s piece cannot assist in matters of accessibility (though it could serve as support for the tool’s purchase), their essay does advance our knowledge of how this tool might be employed and how its features and limitations can best be navigated.

The article is usefully divided into five sections. The first considers the difficulties surrounding the use of newspapers for literary research. The next two parts detail various scholarly and pedagogical uses of newspapers afforded by digital Burney. The fourth section, making up nineteen of the article’s total pages and accompanied by five reproduced screen shots, identifies the external and internal shortcomings of the resource. The final part offers conclusions.

I. Conceptual Barriers to the Utilization of Newspapers

Noting that newspapers make a rare appearance in scholarship and teaching, this section examines the basis for such neglect.

  • A key reason stems from the simple fact that newspapers were virtually unavailable in the US until 1978 when the Early English Newspapers microfilm series made its debut. Even then, however, the series did little to bolster the already scant interest in historical newspapers among scholars. (7)
  • The reign of New Criticism and the subsequent heyday of Theory strongly discouraged the use of material drawn from newspaper content. If newspapers were consulted, the information sought was typically confined to obituaries, book and play reviews, and advertisements for books and cultural performances. (8)
  • That early newspapers either lack organized sections, including headlines, or feature very basic divisions often prove initially daunting to users. Especially in papers published before the 1760s, the lack of source information, the unacknowledged lifting and repetition of content across titles, sparseness of details, and partisan leanings also have made these newspapers seem strange and have done little to encourage their use (8-9).
  • Often scholars do not possess the knowledge needed to extract and draw conclusions about the values contained in many of these papers. Scant information about the circulation and readership of newspapers hinders a scholar’s ability to “analyze their implied readership, ideology, or socio-political agendas” (10). A broad gap exists between the literature we study and teach and the information found in these newspapers (11).
  • II. Research Uses

    The authors supply three extended examples of possible ways that digital Burney can assist researchers.

  • Book Prices: Newspaper advertisements afford us a rich opportunity to compile prices for books not otherwise available (11-12). To illustrate, the authors supply prices derived from digital Burney for satire and then offer various insights this list affords. For one, the list reveals that prices for this genre ranged widely from low to high; the affordability and greater number of lower priced titles intimate that “[t]hese works were intended to reach and influence readers” (16). Additional examples of the price information newspapers can offer include

    Collected works were considerably more expensive to buy than if one purchased the individual titles when initially published.

    Newspapers “can turn up major fluctuations in price over time” for a given title(16).

    Information in newspapers can enable us to reconstruct marketing strategies; for example, some advertisements reveal attempts to reach multiple markets by offering several formats at different prices (16-17).

    As the authors assert, knowledge about book prices matters because “[i]f we are going to understand the works we study and the world in which they were produced and read, then the clearer we can be on price and what it implies about audience, the better” (17).

  • Reception and Reputation: Noting that dissemination contributes to our understanding of the reception and reputation of writers and their works, Marshall and Hume also caution that information drawn from digital Burney searches for prices, reprintings, marketing strategies, commentary or allusions to authors, and the like has its limitations. For one, newspapers until the late eighteenth century offer little in the way of cultural commentary; second, searching for authors’ name can be problematic for numerous reasons ranging from false hits (e.g., “Pope” yields a huge number of results, but many do not refer to the author) to problems with OCR failing to return anywhere near the actual number (18). Still, such searches can provide interesting information and, in turn, questions about the rise and diminishing of an author’s visibility in the papers, the geographic parameters of that visibility, and the contemporary existence of associations or groupings of authors (19-20).
  • Study of Individuals: The Case of John Rich: In this example the authors illustrate ways in which Burney can augment and shift our understanding of understudied individuals through an examination of theatre owner and manager, John Rich. In addition to discussing how Burney yielded fresh information about Rich, Marshall and Hume also discuss briefly the specific, various searches performed to yield hits for John Rich; they close this case study with a cautionary example of how newspapers, while often providing new facts and leads, can also on occasion provide false or erroneous information.

    III. Teaching Uses

    The authors divide their discussion of how digital Burney might be used in the classroom into two sections, one dealing with eighteenth-century economics and the other with the century’s Weltanschauung. Marshall and Hume preface their two pedagogical uses with a warning that students will need much prior preparation before attempting to use the resource. This preparation includes not only assistance with the intricacies and peculiarities of searching digital Burney but also with working with historical primary sources, especially sources as newspapers (24).

  • Economic Issues and the Value of Money: While the research section focused on book prices and dissemination, here the focus is broadened to using Burney to show “students … how things looked to eighteenth-century people” in terms of money–“a much neglected subject” (24). While we can simply tell students today’s monetary equivalents for sums of money mentioned in eighteenth-century literary works, the authors make the salient point that “hearing is not the same as comprehending” (26). What the authors recommend is having students search the prices of everyday items found in newspaper advertisements and calculate their modern monetary equivalents. As they note, their findings can radically shift our understanding about the economic references found in the literature being study and, in turn, carry implications that extend beyond the works.
  • Seeing the World through Eighteenth-Century Eyes: Near the end of this section, Marshall and Hume underscore that what they have been proposing means fundamentally “altering the way we teach” rather than merely supplementing our current methods (30). The crux of this shift entails replacing secondary with primary sources as the means by which students learn to “see[ ] the world through eighteenth-century eyes.” Among the suggested assignments is a rhetorical or ideological critique of a newspaper title during a set time or a comparative variation in which several titles are examined (27). Using ECCO as well as Burney, another possible assignment would have students explore an event or topical reference; commentary on Dr. Sacheverell’s trial, the 1745 Jacobite invasion, the 1730 trial of Colonel Francis Charteris for rape, the American war (as opposed to “Revolution”), or reviews of theatre performances represent just a few of the examples they offer (27-29). Yet another use involves investigating the reception of works based on newspaper commentary (29). Noting that the nature of the course—a survey will differ considerably from an honors seminar—will affect the assignment(s) used, the authors stress that the benefits of such exercises is not enhancing the interpretation of specific works but rather in “helping bring the works we study to life, in making real to twenty-first-century undergraduates the commitments and passions of eighteenth-century writers and readers” (29).

    IV. External and Internal Problems

    Before addressing particular kinds of problems, Marshall and Hume review the basic and advance search capabilities of digital Burney. As the authors rightly note, these two search types will already be familiar to ECCO users. Proximity searches–searches in which one uses a “W” to find occurrences of a term that follows another within a certain number of words (e.g., “Hogg w5 Giltspur” will uncover Hogg within five words of “Giltspur”) or an “N” to find occurrences of a term preceded or followed by another (e.g., “Hogg N20 Giltspur” will return cases of Hogg appearing either before or after “Giltspur” within twenty words of each other)–can be done using either the basic or advanced search. Both kinds of searches can be limited by date and publication titles; both handle wildcard searches (! represents either a blank or any single character; * represents multiple characters, and ? represents any single character); and both accommodate “fuzzy” searches (31-34). This discussion offers even more detailed advice, including remarks about potential outcomes from various search methods.

  • The first set of problems falls under the rubric “External Issues.” While issues such as incomplete runs have emerged in previous emob discussions and the EC/ASECS and ASECS round-tables on these research tools, the approach taken here differs in some respects from points raised in these forums. In addition to incomplete runs (the authors are rightfully thankful for their inclusion and also offer suggestions for locating copies not in the collection), Marshall and Hume discuss the difficulties encountered when searching for material referenced in published works due to the high error rates of citations for eighteenth-century newspapers (35-36). In doing so they also suggest ways to navigate these false citations.
  • Spread-Date Papers and Other Problems with the Documentation and Search Results:
    A serious problem with the disastrous potential for being reproduced exponentially involves the dates digital Burney currently provides for individual issues of titles not published daily. For newspapers published weekly or twice or three times a week,

    [i]f the search engine is used to go directly to a news item or advertisement, the only date the user will see is the wrong one. The correct one has to be found by taking a multi-click detour to bring up the first page of the issue and then resize it to read the printed date on the original paper–ifthe user realizes that this may be a spread-date [a title whose issues each cover a spread of days between publications] newspaper and knows to check. [Footnote 50 indicates that Gale is in the process of rectifying this problem; “Scott Dawson of Gale informs us that they have identified some 70,000 instances of the problem” as of July 2009 (my emphasis)]. (37)

    Duplication is yet another problem and comes in several forms. The Burney collection contains duplicate copies of a given issue as well as duplicate runs of a given title, which at times will result in the appearance of more hits than actually occur (37-38). Another kind of “duplication” results from the habit of newspapers publishing copy identical to that found in other papers (38).

    Acknowledging the problems stemming from OCR technology and the erratic search results these problems generate, Marshall and Hume briefly mention some of the issues already raised in previous emob postings. In terms of false negatives, they usefully remind us of the role played by the Burney search engine’s design. For example, if one’s search term appears across two pages, then that occurrence will be omitted from the results (41). Citing Jim May’s recent article, “Accessing the Inclusiveness of Searches in the Online Burney Newspapers Collection” (The Eighteenth-Century Intelligencer N.S. 23:2 [May 2009]: 28-34), the authors ruefully report that their experiences with search results correspond to May’s claim “that anything from 20 to 50 percent (or more) of what can be found by manually eyeballing the full texts of newspapers will not show up in the list of results” (41).

    Marshall and Hume offer three, serious cases of false negatives, most stemming from the poor condition of the original. Yet, they close this discussion with an example of “a dire problem in Burney’s presentation of Steele’s Tatler (1709-1711)” that arise from problems with the source material made available to Gale (42). In this case, “the first nine months’ worth of one of the foremost early eighteenth-century English periodicals has functionally been erased” because the source used mixed original Tatler issues with the front matter and other material from later book reprints (43-44). Rather than appear in digital Burney under the title “Tatler,” these pre-1710 issues instead appear under the title Lucubrations of Isaac Bickerstaff. While the authors note that this problem could be lessened via “simple relabeling and cross-referencing” (44), the problem also underscores the importance of hands-on scholarly involvement in the preparation and execution of such digitization projects.

  • Some Interface Issues: Under this heading the authors detail “nine of our pet peeves” with the current interface (44).

    1. While one can search or view results according to particular categories of publication such as “Classified Ads” or “Commercial News,” these sections are fairly meaningless, and an advertisement can easily appear under news or vice versa (44).

    2. The inability to perform case sensitive searches (45).

    3. The inability to control the elimination of “stop” words such as “the,” “a,” or “be” when one is seeking hits for a specific phrase or string of words (45).

    4. The numerous clicks one must endure to confirm the paper, date, day; the best solution to this problem would be for Gale to offer the title and spread date on each and every display page (45).

    5. Related to (4), “that title and date would appear with whatever one printed from page to page.” As the authors note, the need to record manually this information on printed copy of a given page encourages the occurrence of errors, many of which will be multiplied as erroneous citations in future publications (45).

    6. The Browse Publication Title inefficiently results in “a set of links to what are reported as “[X number of] issues” chopped into [X–often in the thousands] chunks of News Advertisements, Business News, etc.” and consequently requires the user to guess where “the desired date might fall.” While using the “Publication Search” is a better approach, this search is not without its problems (46).

    7. The inability to search efficiently for “Other papers for the same date.” Currently, without such a dedicated search feature for this option, one must conduct an “Advanced Search” using “Publication Date”; if multiple dates are sought, one must repeat the process for each date desired (47).

    8. The confusion between the “Previous/Next Article” (“article” here is a misnomer) and “Previous/Next Page”; the first navigates results found, while the second, which appears directly above the newspaper’s text, will take the user to the next page in the issue being viewed (47).

    9. Although one has three options of searching for particular issues of a given title, the three processes differ in their operations, primarily in whether they accept or not the inclusion of an opening article (“the”) in a newspaper’s title (47, 49).

  • Following the “pet peeves” list, the authors offer useful information and advice about the intricacies in printing one’s results. Such information is particular valuable, for as the authors also note, digital Burney’s “printing facility is neither self-evident nor at present particularly well explained” (50). Especially vexing is the failure of several print options to include title and date details.

    V. Observations and Conclusions

    Admitting that hindsight makes for easy criticism, Marshall and Hume nonetheless correctly claim that many of the problems identified in Burney might have been avoided if scholars with appropriate expertise had been closely consulted in the preparatory stages of this significant tool (50). Similarly, if the interface and search features had been tested by actual, potential users, many of the snags in searching might have been eliminated in advance of the tool’s official release. They also draw attention to the commercial nature of the enterprise. Although they do not mention affordable access here or elsewhere, they do stress the high expense and the subsequent expectation among purchasers that “when significant problems emerge … they need to be seriously addressed” (51). The efforts underway to correct the dating errors in spread-date newspapers is no doubt an example of a serious problem that is receiving attention.

    Despite existing problems Marshall and Hume celebrate the wondrous possibilities that digital Burney does afford. While they clearly view research and scholarship as the realms in which digital Burney’s transformative effects will first be felt, they also reiterate the radical alterations it will eventually bring to teaching and classroom practices (52).

    ASECS Summary of “Some Noisy Feedback” Roundtable, Albuquerque 3/18/10

    March 27, 2010

    ECCO, EEBO, and the Burney Collection: Some “Noisy Feedback” Roundtable

    Chair: Anna Battigelli (SUNY Plattsburgh)   Panelists: Sayre Greenfield (University of Pittsburgh, Greensburg), Stephen Karian (Marquette University), James E. May (Penn State University—DuBois), Eleanor Shevlin (West Chester University), Michael Suarez (Rare Book School, University of Virginia).  Respondents: Jo-Anne Hogan, (ProQuest), Brian Geiger (ESTC, University of California, Riverside), and Scott Dawson (Gale/Cengage).

    The following offers a summary of the roundtable that took place, Thursday,  March 18, 2010  at the ASECS 2010 conference in Albuquerque, N.M.  This session was the second part of a two-part series, the first part having been a roundtable discussion chaired by Eleanor Shevlin at the EC/ASECS meeting in Bethlehem, Pa in October 2009.  Copies of Eleanor’s summary of the EC/ASECS session (published in the Eighteenth-Century Intelligencer and also on this blog) were distributed at the outset of this session.  Many thanks to the members of the audience who so cheerfully presented themselves at an early hour on the conference’s first day.

    Sayre Greenfield opened discussion with detailed working solutions to problems caused by ECCO’s OCR (optical character recognition) software.  He recommended that Gale provide an ECCO OCR troubleshooting page on their web site and noted that blogs like this one would be sure to start that process (see below).  Aided by Deidre Stuffer, he found ways to correct for errors stemming from the following letter combinations that OCR typically mistranslates: s, ss, and ct.  Using the word, fishmonger, he substituted for the s every other letter, then substituted numbers, and finally the wildcard question mark.  Advice from his search results, including how best to use the question mark as a wildcard, can be found on the ECCO OCR Troubleshooting Page on the “Pages” section of this blog.  He warned that using the question mark for any medial or initial s is problematic if one is using variables elsewhere, adding that ECCO does not allow wildcards for the first letter of a word.  Additionally, letters surrounding the s seem to affect how the OCR reads the s.  The double ss, for example, frequently morphs into fl, transforming passion into paflion. Word searching within a text also proved problematic.  Though he found 32 instances of passion or passions when he read John Tottie’s A View of Reason and Passion, his electronic search using passion* yielded only half of these.  Turning to ct, he found that OCR often reads ct as t, so that objection becomes objetion.  These results suggest that ECCO would help users by strengthening its web site, which currently recommends fuzzy searches to address OCR problems.  Fuzzy searches create too many false positive results.  Including a more robust help page on this issue is necessary.  (For now, see Sayre’s ECCO OCR Troubleshooting Page on this blog.)

    Steve Karian began by acknowledging the indispensability of ESTC for bibliometrics, but he also identified four problems that need to be addressed if the ESTC is to become the powerful tool it can be for the twenty-first century.  The first is the ESTC’s unit of measurement: the ESTC record.  Users often equate an ESTC record with an imprint, title, edition, or an issue.  Because of variations in the correlation of record to item, one cannot simply assume that two parallel sets of search “hits” can be compared reliably.  As he puts it, “one is constantly comparing apples to oranges.”  Additionally, field records vary, limiting or complicating the kinds of searches that can be done.  These need to be standardized if searching is to become reliable.  The two ESTCs—one at UC-Riverside, the other at the British Library—use the same data but different interfaces.  Dates are complicated because they appear in two MARC (Machine-Readable Cataloguing) fields.  Steve recommended deleting the MARC record entirely and replacing it with a new database structure, one designed to expand and grow.  He called for a new stage of innovation, allowing the ESTC to transform itself from a bibliographical catalogue into a bibliographical database.  Only through such a transformation will the ESTC become the powerful tool it promises to be.

    Jim May discussed the Burney Collection, which he argued should be called the Burney Collection of Newspapers, Periodicals, and Other Printed Matter.  Its material was first collected by Charles Burney, subsequently increased by the British Library, and eventually microfilmed before being turned over to Gale/Cengage.  It includes material dating back to the 1620s and beyond  1800 and material printed in Barbados, India, Ireland, and North America.  Citing James Tierney’s comments at the Bethlehem meeting, Jim noted that the collection includes 237 newspapers and 161 periodicals, 60 of which are partially available in Adam Matthews Eighteenth-Century Journals series or ProQuest’s British Periodicals.  Burney allows one to read an entire issue or study issues by year or month, and it offers searching, though this is problematic.  According to Jim’s results, searching sometimes yields only 10% of the relevant items.  Searching for “Tatler” between 1708 and 1712 yields 80 hits.  Though he has found hundreds of advertisements of Smollett’s Continuation of the Complete History of England, only few of these can be found through an electronic search.  Similarly, only a third or fewer of The London Evening Posts published 1760-61 turn up when you search for “London Evening”.  Robert Hume and Ashley Marshall have an essay forthcoming in Papers of the Bibliographical Society of America discussing Burney and noting, among other problems, how definite and indefinite articles interfere with searches.  Jim also cited Simon Tanner’s article in D-Lib Magazine (July/August 2009), which found the following accuracy rates for Burney: character 75%, word 65%, significant word 48.4%, capitalized word 47.4.% and number 59.3%.   The magnification feature enlarges pages by 100% and would be more useful if it magnified by 33%.  Spread dates are misrepresented, due to the lack of editorial apparatus explaining when newspapers were actually issued.  Burney’s lack of editorial apparatus, cross references, comments, and so forth is a deficit.  Having a scholarly editor–perhaps a graduate student or postdoc intership– would improve its utility.  Also needed is a review of the entire database.  A page dedicated to errors encountered by users would help, something EEBO is now working on with in its “EEBO Interactions, A Social Network.”

    Eleanor Shevlin identified three pressing needs: 1) fostering greater awareness of the context of texts; 2) encouraging collaboration among users; and 3) cultivating greater access to these electronic resources.  She pointed to the need for bibliographical training in order to use these resources accurately and called for an examination of the cognitive effects these tools have on research processes.  Specifically, she wondered how EEBO’s TCP transcriptions or ECCO’s searching mechanism affects research methodology.  Noting that these tools provide opportunities to correct bibliographical inaccuracies, she urged the need for a more standardized process through which corrections could be forwarded to the ESTC or to commercial databases.  She also cited examples of productive collaboration among members of the bibliographic community, including her own experience correcting an error in Kansas’s Spencer Research library, a correction made possible by sending ECCO’s image of the British Library’s copy of a text to Kansas.  Finally, she noted that access continues to be a problem.  Scholars in the U.S. work at a notable disadvantage compared to scholars in the U.K. who typically have access to ECCO and ECCO II through the Joint Information Systems Committee (JISC).  ASECS President Peter Reill’s recent calls for feedback regarding access suggests that the issue is at least on the radar of those who can help, either through negotiations for large-scale access or  individual subscriptions.

    Michael Suarez warned against the illusion of comprehensiveness in database searches.  Users are frequently unaware of what is missing in these databases, and the databases’ selectivity impoverishes word searches as tools for analysis.  Turning to the task of text-mining, he expressed skepticism regarding the mentalities of mining.  Where sustained engagement with individual texts allows for work linking texts to their culture and to other texts, textual extraction can produce radically decontextualized results.  Because these database tools are easy to use, we are, he warned, insufficiently uneasy with what they actually accomplish.  Suarez insisted that textual analysis demands an effort to fuse horizons between text and reader, a fusion that involves a reader’s deep engagement with a text’s historical context and with a text’s relationship to other texts.  Such contextualization, as James Boyd White would agree, is essential to a functional and robust literary hermeneutics.  Additionally, text-mining tools encourage scholars to work in even greater isolation, away from libraries and other scholars.  Precisely because the digital future will change the way we think, Suarez called for a greater bibliographical literacy in order to make these promising tools work properly.

    Panelists’ Responses:

    Jo-Anne Hogan (ProQuest)  agreed with Michael’s concern regarding the impact of these digitization projects.  She added that EEBO routinely receives emails pointing out errors, asking for missing items, and making recommendations, and that it works to incorporate these suggestions.  But she also noted a growing digital divide: concerns voiced at conferences like ASECS differed from those at conferences on the digital humanities.  At the latter, attendants ask EEBO to produce more tools for text-mining.  It is sometimes difficult to reconcile the competing requests received.  Money matters in these issues, and will always be a factor.  She agreed that more could be done to align the bibliographic data in EEBO with that in the ESTC and pointed out that efforts are under way to make that happen.  She also introduced the prospect of a social networking site for EEBO intended to facilitate communication between scholars and users so corrections can be reported and more contextual information can be made available.  We hope to hear more from her about this on this blog in the near future.  Access, she concluded, continues to be a concern, agreeing with Eleanor that it is unfortunate not to have a model for broad access in the U.S.  Personal subscriptions seem unlikely because such subscriptions cannot cover costs, at least not at subscription rates individuals are willing to pay. She hoped there might be a point in the future when ProQuest can provide broader access, but she could not guarantee such a thing.  More promising is the prospect that about half of the books in EEBO will soon be available for purchase at reasonable rates via Print on Demand.

    Scott Dawson (Gale) agreed with Sayre’s suggestion that a Help screen dedicated to OCR problems  is an idea to consider seriously.  He added that Gale would look into post-OCR checks that might correct results.  18thConnect will help by testing new OCR software on ECCO page images, and that might solve problems.  Turning to Steve’s comments about ESTC, Scott noted that ECCO depends on ESTC for metadata, and that Gale is working with ESTC to add a link within the ECCO Full Citation to report problems with a given record.  He agreed with Jim May that Burney presents additional obstacles to getting accurate OCR  results.  Gale has been working with the British Library to resolve the issue of spread dates and hopes to have an update in the next few months.  On the issue of access raised by Eleanor, Scott mentioned that ECCO is concerned about the issue, but that by providing access to more than 500 institutions globally, it has helped make early modern printed material more accessible than is possible through hard copy or microfilm.  Tiered pricing and consortia-designed contracts help non-ARL institutions find ways to subscribe to ECCO.  He greed with Michael Suarez that ECCO is incomplete, even with the 50,000 titles added through ECCO II.   Gale is not planning an ECCO III.  But the possibility of linking missing titles to ECCO is being considered.

    Brian Geiger (ESTC) outlined two main areas of work at the Center for Bibliographical Studies and Research (CBSR), which manages the North American branch of the ESTC.  First, they continue to upgrade and add records to the ESTC.  They are processing OPAC extracts from libraries, and recently began on an extract from Oxford University that resulted in some 200,000 records that will be matched against the file.  These OPAC extracts provide shelf marks (or call numbers) for existing items, and have turned up tens of thousands of new copies and hundreds of entirely new items.  They are adding urls from online collections.  EEBO, ECCO and TCP are matched, though not yet displayed by the public version at the British Library.  Brian has requested urls from Google and will do the same from Internet Archive.  They are digitizing title pages from paper reports submitted over the last two decades and will attach those images to the appropriate records, allowing users to compare a title page to its MARC record.  They hope to have many of the title pages in the ESTC by 2011.  And they have enhanced some 180,000 MARC records from title pages in ECCO.  Second, the ESTC has started to assess how to transform the project from an online catalog to a flexible and interactive database-driven research tool.  Brian corroborated Steve Karian’s assessment that this new resource should be built on relational databases, and noted with appreciation the value of the kind of collaborative thinking Steve offered about the project’s future.  Brian emphasized that a number of partner projects and institutions should be involved in the redesign, to ensure that the new project meets a variety of user needs and to try to plan for the sharing of information across platforms.  He mentioned some of the features that he thought should be included, among them user editing of bibliographic data and metadata and tools to send information to users about updates or changes to records.  He ended by pointing out that development of the database will require resources and the next stage of the ESTC’s evolution will be contingent on funding.  The ESTC is currently engaged in grant development.  It will be in a better position to discuss specific solutions once funding is secured.

    Digital Humanities at AHA

    January 12, 2010

    In an earlier post we covered MLA panels devoted to digital humanities, electronic archives, and electronic tools. Thus, although the American Historical Association annual meeting has already recently concluded, we still thought it would be useful to review the sessions held at this convention. When available, I have included links to papers or abstracts.

    Humanities in the Digital Age, Part 1: Humanities in the Digital Age, Part 1: Digital Poster Session
    This session will provide participants with an overview of different digital tools and services and how historians are using them for research, teaching, and collaboration. After brief introductions to the various posters, participants would walk around the room spending time at the various stations, talking with the presenters and other participants. This will be followed in the afternoon by a hands-on workshop (session 73) where participants can learn more about how to use these specific tools. Co-sponsored by the National History Education Clearinghouse (NHEC):

  • Blogging, Jeremy Boggs, Center for History and New Media, George Mason University
  • Text Mining, Daniel J. Cohen, Center for History and New Media, George Mason University
  • Student Projects/Websites and Omeka, Jeffrey McClurken, University of Mary Washington
  • Zotero, Trevor Owens, Center for History and New Media, George Mason University
  • Teaching Tools, Kelly Schrum, Center for History and New Media, George Mason University
  • Web 2.0 – Flickr, YouTube/Video, Google Maps, Wikis, Jim Groom, University of Mary Washington
  • (more…)

    Searching ECCO

    August 27, 2009

    Eleanor called my attention to the fact that ECCO provides a list of the most common search terms by quarter.  When I looked this up, I found that the most frequent search term last quarter was “Gold,” with 5981 searches.  The next most popular searches were

    Sleep                (5829 searches)

    America           (3110 searches)

    Woman            (2520 searches)

    Our ongoing discussion of  searching methods in Burney (made possible by free access to the Burney Collection of Newspapers through October 30 via http://access.gale.com/emob) has been productive and will, I hope, continue.  As we discuss Burney, I am also curious how best to approach searching ECCO.  Do these search terms—“Gold,” “Sleep,” “America,” “Woman”—tell us anything about how scholars search ECCO?  Are there particular methods that work?

    AB

    Trial Access for Burney Collection and Search Methods

    August 12, 2009

    Gale/Cengage has generously agreed to offer a free trial of the Burney Collection for readers of this blog at http://access.gale.com/emob.  This provides us with an opportunity for an open discussion of the Burney Collection’s merits, both as a scholarly resource and as a pedagogical tool. 

    In preparation for the two sessions on digital text-bases, it would be interesting to hear more about how users search Burney.  Search results can be overwhelming and show the need for the Library of Congress cataloguing and classification system to help categorize and make sense of the wealth of data that emerges from any given search.  Thomas Mann, a Reference Librarian at the Library of Congress, has a still useful 2005 discussion on the limits of computerized searching for research at http://www.guild2910.org/searching.htm.  Mann’s site might be particularly helpful in discussing computerized searching with students.  His example is that the 11,000,000 results for the word “Afghanistan” are unclassified, whereas under the LC system, they are neatly parsed into “Antiquities,” “Bibliography,” “Biography,” “Boundaries,” Civilization,” and so forth.  So the argument in favor of LC classification and cataloguing is clear.

    On the other hand, it would be foolish to overlook the value of non-classified search results.  Matthew’s p0st on machine reading makes clear the value of understanding more about what computers can do.  But searching Burney isn’t necessarily clear from the outset.  It would be very interesting to hear more about how individuals use search methods within ECCO, EEBO, and particularly Burney.  We are grateful to Gale/Cengage for making this collective review possible.