Collaborative Readings #3: Sayre N. Greenfield’s “ECCO-locating the Eighteenth Century”

by

Greenfield’s article provides a useful case study of a series of searches using ECCO, revealing both its possibilities and its current limitations. Originally a Presidential Address for the 2006 EC/ASECS meeting, the essay was subsequently published in The Eighteenth-Century Intelligencer.  It originated in Greenfield’s interest in tracing Shakespearean quotations throughout the eighteenth century, a task that involves text-mining, or, as Greenfield puts it, “reading across centuries of texts to trace very specific threads of culture” (1).*  Greenfield sees this task as “cultural paleontology, trying to trace the evolutionary line of a cultural phenomenon by the records of cultural transmission of phrases that have fossilized into print” (1).

To begin with, Greenfield makes a good case for the necessary rigor required to use ECCO as a text-mining tool. A familiarity with ECCO’s search function is crucial. Because ECCO disregards common words such as “to,” “be,” “or,’ “that,” “is,” and “the,” one would have a difficult time locating references to Hamlet’s “to be, or not to be” soliloquy using only that line. Using “whether ‘tis nobler” helped. Even when the search function works fluidly, the results need to be interpreted. To see how Shakespeare’s Romeo and Juliet fared throughout the eighteenth century, Greenfield conducted a series of searches for words and phrases from the play, such as “Romeo,” which produced 2441 hits. He broke down the hits decade-by-decade, taking into account the relation of number of hits to the total number of works printed. In the case of “Romeo,” he eliminated hits unrelated to Shakespeare’s play and also the many false hits electronic reading produces. Some of these false positives were produced by “Rome” or “Roman,” for example. Of the thirty-three hits produced for the century’s first decade, only twelve actually contained the word “Romeo.” When Greenfield eliminated non-Shakespearean Romeos and multiple editions of the same work, he was left with a single edition of the play and three other texts that referred to Shakespeare’s character for that decade.  As might be expected, using ECCO for this kind of search requires some labor.

Greenfield’s article suggests ECCO’s promise, particularly for tracking cultural trends. Early in the century, references to the play are less likely to refer to Shakespeare’s original than to Thomas Otway’s adaptation, The History and Fall of Caius Marius. Here we see how ECCO helps trace the afterlife of an adaptation before the original reclaims public interest. In the case of Romeo and Juliet, ECCO confirms what might be expected: after the play was revived in 1744, references to it increase.  Greenfield was also able to isolate the specific passages that were most frequently cited and to place the emergence of “Romeo” as a common noun in the 1750s.

Perhaps the most impressive part of Greenfield’s argument is the clarity of his solutions when he confronted problems.  When he found that “O Romeo, Romeo, wherefore art thou Romeo?” had become a familiar line to be parodied by the 1760s, he needed to devise a new searching strategy to include parodic versions of the line using names other than Romeo’s.  But searching for the phrase “wherefore art thou” called up biblical passages that needed to be eliminated:

[S]earching for the entire phrase “Wherefore art thou,” from 1701-1800, gets 870 hits, mostly because of biblical uses of the phrase, “Wherefore art thou red in thine apparel” from Isaiah 63.2 , and so forth.  So I tried running an Advanced Search on “wherefore art thou” NOT “wherefore art thou red” NOT “wherefore art thou absent” (a phrase from teh Psalms) NOT “wherefore art thou come” (Matthew 26.50) in full texts, from 1701-1800, finding 389 hits.

Some of the hits that emerged included the very phrases he tried to eliminate.  Nevertheless, his article provides an excellent case-study for ECCO’s text-mining potential–and some clear guidelines for using ECCO’s powerful search function intelligently.

*page numbers refer to the printed copy of Greenfield’s Presidential Address.

About these ads

34 Responses to “Collaborative Readings #3: Sayre N. Greenfield’s “ECCO-locating the Eighteenth Century””

  1. Eleanor Shevlin Says:

    Thanks for the summary of Sayre’s very fine article. Not only does his piece provide excellent insights for how to negotiate the intricacies of ECCO searching, but I also think it would be helpful in having students learn to search ECCO more effectively.

    Beyond its importance to the 18th-century classroon, exposing students to manipulating search words to read “across centuries of texts to trace very specific threads of culture” (1) should help hone their information literacy skills and also heighten attention to language usage.

  2. Anna Battigelli Says:

    I especially liked Sayre’s use of quotations in the advanced search. It did not seem to work completely perfectly, but it offered a good way to chase quotations while eliminating similar quotes by using the NOT operator.

  3. Dave Mazella Says:

    I’ve probably asked this question, before, but are there are any plans for distributing these important ECI pieces more widely? It would be great if they were accessible online. I can get them through my ILL, but I think they would be tremendously helpful if distributed more widely.

    Alternatively, are EC-ASECS members able to get access to them as an archive?

    But these kinds of case histories of problems encountered and solved are very helpful for those trying to figure out these tools.

    DM

    • Eleanor Shevlin Says:

      Dave,

      We can ask Jim May (and the authors) how he (and they) feel about having the articles posted here.

      If it is a problem to post, I would be glad to photocopy the essays and mail them to you.

      Click here to access the EC/ASES website.

      The following back issues are available on the EC/ASECS websitevia the link Newsletters Archives (one does not need a password to access):
      Back Issues of ECI
      ECI Volume 22, Number 1: January 2008

      ECI Volume 22, Number 2: May 2008

      ECI Volume 22, Number 3: September 2008

      Also, the dues for EC/ASECS are only $15.00 a year–and the wealth of information in the Intelligencer is worth far more than that (just a plug I couldn’t help!).

      ES

  4. Sayre Greenfield Says:

    Thanks for the nice comments about my ECI piece, but my puzzling about the ECCO search engine continues. Is anyone keeping an account of how the search engine actually works? I do not find the instructions within ECCO or on the general Gale-Cengage web site entirely accurate, especially when it comes to the use of quotation marks or wild cards. That is, the results don’t match my expectations. For example, in hunting for references to Hamlet that have been misread by the OCR software, I tried typing in Hamlet for 1701-1720 and get 492 results. Then I tried typing in H?mlet for the same years and get 512 results. Great, I think: undoubtedly those extra 20 texts represent Hamlet with a typesetting error in the “a” or Hamlet with an OCR recognition error, but the texts would most likely contain references to Hamlet. To isolate them and look at the individual cases, I tried typing in Ha?mlet NOT Hamlet, and expect 20 hits. However, I get the original 492 hits. Does anyone understand what is happening here?

    • Anna Battigelli Says:

      Sayre: You must mean that in your second search, you typed in “H?mlet NOT Hamlet,” right? [You wrote "Ha?mlet NOT Hamlet."] My connection to ECCO is down tonight, so I can’t replicate your search, but this is exactly the kind of thing we should explore to see, as you put it, how the search engine actually works. If we can understand this specific example better, I think we’ll understand ECCO better. Any suggestions out there?
      AB

    • Anna Battigelli Says:

      Hi Sayre,

      Just to confuse things a bit more, I just did a search for “H?mlet” NOT Hamlet, and I came up with 12 hits, which should be the entries you’re seeking. I clicked on “NOT” both after the initial entry for “H?mlet” and after “Hamlet”.

      I’d be interested in to see whether when you or Eleanor replicate this, you find that the search works as you intended. I used ECCO I.

      • Eleanor Shevlin Says:

        I have written Gale with Sayre’s original query and my speculations, and we should be hearing back shortly.

        I tried Anna’s search string–that is entering H?mlet in the first row, pulling down NOT from the second menu and entering Hamlet, and then pulling down Not again, with the limits of 1701-1720–and I still receive 446 hits (not 492 because, as Sayre pointed out, my hits do not include ECCO II).

      • Anna Battigelli Says:

        Eleanor:

        I e-mailed you the list of twelve items that I got when I tried the following search:

        H?mlet NOT
        Hamlet NOT

        I’m hoping we can figure out why we’re getting different results. We’re both using ECCO I. I may be missing something.
        AB

      • Eleanor Shevlin Says:

        Hi, Anna

        I figured out the problem. I had thought WCU had adopted the new interface (I can search EEBO and ECCO I simultaneously, for example), but once you sent the link to ECCO, and I did the search in the trial ECCO–and not WCU’s version–I did come up with 12. In the old interface, the “And, Not, Or” boxes appear to the left of the blank for the search word, followed by the pull-down menu whose default is “entire document.” The first row has no menu to select “Not”. In the new ECCO the search word blank comes first, followed by pull-down menu whose defualt is “full-text” (as opposed to “entire document”) and then by the pull-down menu for And, Not, and Or.

        To search according to your instructions in the old form, I had to have the second “Not” appear next to a blank search word box (though I did also try using “Hamlet” twice).

        This difference may be one that’s only related to differences in the appearance of the interface. Yet I suspect the difference in appearance is also accompanied by differences in search capabilities. When I wrote Scott, he did ask me if I were using the new ECCO II interface–and I thought I was because of several changes to the interface. Obviously, I was not.

        BTW, if one searches H?mlet NOT Hamlet in the trial access interface, the 12 does come up. In other words, one does not need to use the “Not” twice.

        ES

      • Anna Battigelli Says:

        The different interfaces do, then, seem linked to different search capabilities. Sayre should probably try his search on ECCO by using the link on this blog to see whether he gets better results, which according to what you have just said, he will. This suggests a flaw with the old interface.

        What we’re also learning is that ECCO is strengthening its search power over time.

        AB

      • Eleanor Shevlin Says:

        Actually, working with a tech person, we discovered that the trial link used the old interface. This link has now been updated to the new interface. Gale is investigating why the search does not yield the expected results… and will get back to us.

        ES

  5. Dave Mazella Says:

    OK, I’ve read the version of SG’s very useful article that Anna kindly passed along, and I agree that it points up some of the trickier aspects of searching ECCO. Here are some of my takeaways:

    1. “Cultural paleontology,” or reading across texts over time, yields plenty of interesting trends, but the trends and especially their causality require further explanations.
    2. One of the difficulties with the approach, however, is the tacit limitations of the search engine itself, such as the elisions of common words.
    3. Indirection, or searches for less common words, seems to be the best way to control for the inconsistencies noted in #2.
    4. ECCO, as wonderful as it is, is no substitute for the carefully constructed critical accounts (and sources) found in Greenfield himself, or Dobson, Marsden, etc. These are absolutely necessary to give you a more useable cluster of terms to pursue, or to figure out where to search next, once the obvious terms are done.

    I saw that Sayre had some comments pending, so I hope to hear from him. The one thing I’d ask again, is whether we could have permission to post the full-text version of this and some of the other bibliographic articles of ECI, here on the blog.

    Best, DM

  6. Sayre Greenfield Says:

    Indeed, Anna corrects me correctly: I did mean to write that I tried H?mlet NOT Hamlet and received, puzzlingly, the same number of hits as for Hamlet. And, of course, Dave’s fourth point about consulting ECCO as one of a number of tools is absolutely correct. But ECCO is different than “carefully constructed critical accounts” in some ways that can be advantageous, depending on the nature of one’s research. One might use ECCO for samplings the meanings or contexts of a term or phrase in a certain span of years, where one can make the relative unselectiveness of ECCO work for one.
    –SNG

    • Anna Battigelli Says:

      I’m curious to hear more from Sayre about using ECCO “for sampling the meanings or contexts of a term or phrase in a certain span of years.” And how is unselectiveness an advantage?

      Dave will surely be interested in how ECCO can be used to detect changes of a term’s context, since this is something he is asking his Jane Austen students to examine.
      AB

  7. Eleanor Shevlin Says:

    Hi, Sayre and all,

    Let me say that I also am still working on understanding fully the search features.

    As for why you did not get the 20 texts when you searched H?mlet NOT Hamlet, I assume that that the ? character overrode the “not Hamlet”–that is, that the search engine is not equipped to process this type of search query because of the use of the wildcard.

    If you enter H?mlet as the search word on the first row of the advanced search screen, and then on the next row change the “AND” to “NOT” and use Hamlet as the search word, the same number of hits is returned as if you had simply searched for “Hamlet” (which is the reverse of what one would expect). Often I use the “search within these results” option, but again, this approach works if not asking to find a word with one wildcard letter and then omit all of those results based on an exact spelling. In this case, searching for “Hamlet” within the results for “H?mlet” will bring up the same number of hits obtained for Hamlet search, yet searching for “Not Hamlet” within the H?mlet results yields 0 hits. Again, I believe that’s because the search engine cannot interpret the query. Although not the same situation, the rule that a “word can NOT have both a wildcard and fuzzy search level applied to it. If both are specified, the wildcard will take precedence” helps explain why the type of search Sayre was attempting did not work.

    By the way, when I searched for H?mlet in entire document, 1701-1720, I received 458 hits. When I searched for Hamlet in entire document, 1701-1720 I obtained 446 hits.

    While still too broad, one can see where the some of the texts appear by viewing the breakdowns according to areas.

    For the Hamlet search, one sees the following breakdown:

    Narrow results by subject area:History and Geography (71)
    Fine Arts (7)
    Social Sciences (62)
    Literature and Language (163)
    Religion and Philosophy (54)
    Law (83)
    General Reference (11)
    Medicine, Science and Technology (7)

    And for the Hamlet searh, the following breakdown is offered:

    History and Geography (66)
    Fine Arts (6)
    Social Sciences (60)
    Literature and Language (162)
    Religion and Philosophy (51)
    Law (83)
    General Reference (11)
    Medicine, Science and Technology (7)

    While it would be easy to determine the title not returned for Fine Arts, the others would require far more work to find the texts returned only by the H?mlet search.

    • Anna Battigelli Says:

      Eleanor, I like your suggestion for why Sayre’s complex search did not work. Maybe the wildcard and the NOT operator cannot both be used. It would be helpful to have this confirmed. I might e-mail Scott Dawson about this.

      • Eleanor Shevlin Says:

        Scott could confirm, but I do think the problem here stems from the use of the wildcard in the same position in which the letter “a” occurs. Using the ? and a definite letter is akin to asking the computer to find all cases in which the second letter can be any character but not any cases in which the second letter is an “a”–which could seem a “contradictory” proposition to the search logic built to handle queries. Any character is no longer truly “any” once one says “not X”.

      • Anna Battigelli Says:

        Good point.
        AB

    • Anna Battigelli Says:

      I heard back from Scott Dawson, and he says you can combine a wildcard and a Boolean operator together. His example is “slave*” NOT “Trade.” “Slave” produced many results; the combined search reduced that to 80,000.

      What you can’t do, he tells me, is combine a wildcard with a fuzzy search. This overwhelms the system.

      But we are still left with Sayre’s example of “H?mlet” NOT “Hamlet,” which failed to produce the extra hits that “H?mlet” got but that “Hamlet” alone did not.

      Hmmm…
      AB

      • Eleanor Shevlin Says:

        I had written to Gale about other matters this morning, and I mentioned this posting/query/exchange on the blog.

        The clarifications about these two types of searches (wild card with Boolean and wild card & fuzzy) are actually on the Gale website, but, as Anna notes, these two situations are not the same as the H?mlet Not Hamlet query. However, I really believe that the explanation here is a subset of the prohibition against fuzzy/wild card searches. I will write Gale and ask for confirmation.

  8. Dave Mazella Says:

    Detecting changes in a term’s context was something I learned about, pre-ECCO (others had it, but I didn’t and still don’t have it), when I was writing my cynicism book and I was using resources like the chadwyck-healey poetry, drama, and fiction collections to find hits regarding the Cynic philosophers and other assorted contexts of the term. At the same time, I was using JSTOR and MUSE and MLA similarly, to mine full-text critical resources for the same purpose.

    What I basically learned was that searching sometimes entailed learning about the contexts in which certain terms appeared, because knowing the contours of the various contexts could often predict where the next instance would be. Searching could then proceed to levels higher than the full-text, so that you could see the term’s function within a more general setting or what I called the “discursive landscape.” This is how I learned, for example, that many of the instances of “cynic” pre-19th century were associated one way or the other with Lucian, Lucianic imitations, or Lucianic genres.

    If I understand Sayre correctly, I think electronic “browsing” entails a certain amount of indirectness and serendipity, but it also allows us to be surprised by (and learn from) appearances of related terms in relatively unfamiliar settings, or in unfamiliar or unexpected authors. Does that sound right? DM

  9. Eleanor Shevlin Says:

    I was busy searching ECCO via Sayre’s comments about Hamlet while these other posts were coming in (thus some of my previous remarks about the searches may now seem unnecessary).

    Both Sayre’s remarks about shifting contexts for a term and Dave’s follow-up explanation about constructing more sophisticated searches after one’s initial forays match my experiences using Google Book Search and other databases (such as Chadwyck Healey–which West Chester does not have but which I used fairly frequently many years ago). And I agree with Dave’s summary that this type of “browsing entails a certain amount of indirectness and serendipity” as well as the potential to learn much from the range of appearances for a word or phrase. We have often heard of the benefits of browsing in the stacks and encountering by chance a text that is extremely helpful to a project. The loss of these opportunities for those who work primarily with electronic, off-site resources has been bemoaned at times.

    However, electronic tools such as GBS as well as ECCO, EEBO, Burney offer a different browsing experience whose serendipity can be arguably richer (based on what one encounters and one’s knowledge) than library shelf browsing. (I still engage in stack browsing–and benefit from it–my emphasis here is on difference and new opportunities made possible by these new electronic tools.) Indeed, I think that Sayre is referring to this characteristic when he speaks of the potential “unselectiveness” of an ECCO search. Those working on 18th-Century Brit literature projects are probably spending much of their library “shelf” time in the PR and DA sections (admittedly one might be in the “Z” or any number of other sections–but I am speaking of the prime location for literary and historical materials) of an LC arranged library. (WCU uses Dewey-something I still have not warmed up to despite my many years using public libraries). Yet, searches in GBS and ECCO have the potential of directing one’s attention to works and subject areas that one may have never thought to examine. Of course, one also can encounter numerous irrelevant results through this unselectiveness. In the Hamlet search, for example, one is receiving results for “hamlet” as a small village. Yet, one would not be surprised by these hits.

  10. Sayre Greenfield Says:

    Both Dave and Eleanor understand my remark about unselectiveness of the databases. Too often as a scholar I and perhaps others “round up all the usual suspects.” EEBO and ECCO let me get beyond that, if I look at the results carefully. To illustrate: this year I have an undergraduate research assistant (wonderfully meticulous), who has been scanning ECCO for Hamlet phrases. She quickly learned to expect results for theatrically-related texts. Yet when searching for “bird of dawning,” from the end of Act 1, scene 1, she was surprised to find religious texts quoting the entire passage in the middle of the 18th century. One might, if one were incautious, start dismissing one’s hits on a phrase search from supposedly irrelevant subject areas, and yet it is those unexpected hits that can point one in a new direction.
    –Sayre

  11. Eleanor Shevlin Says:

    The “bird of dawning” case offers a very clear, specific example of the unexpected returns ECCO and its counterparts can return. (I was short on examples and time last night, but I did try to think of an illustration–I’ve been using ECCO lately for different types of searches.)

    On a different note, I did wonder why three of the hits for the Hamlet search fell under the category “Fine Arts”. Although I glanced only quickly at these texts, their titles suggested they were all plays, and I was wondering about why they cropped up here as opposed to under the Literature rubric. Of course, plays, performances, etc. are part of the fine arts, but it seemed strange that these three (and not more perhaps) were categorized as such. I’ve been having lots of problems with my Internet connection, so I have not returned to ECCO today to investigate this issue (in fact, I lost this post when trying to send, and thus I’m re-typing now that I am again connected). I suspect input from Gale might be needed to answer my larger questions about the rationale for classifying works under the various headings (the FAQ page, etc did not shed light on the issue). Did Gale inherit the classifications? Or did it sort the texts into the various groups?

    • Eleanor Shevlin Says:

      Scott Dawson at Gale explained the origins of the categories:

      For the classification of the ECCO texts into the broad subject categories, those were assigned by Gale over the many years of the filming project. It was done partly to help break down the film project into smaller components as the entire collection was so large.

      Knowing the source and rationale for these categories seems helpful. That they were devised to assist with pragmatic, operational matters related to filming places them in a different category than ones devised for scholarly classification. That said, the categories are still potentially useful for scholars. It is interesting to consider that if an institution does not want/is not able to purchase the whole of ECCO, they can purchase individual subject sections and that these sections are based on divisions created by Gale as they digitized the micofilm.

      • Anna Battigelli Says:

        This is very helpful. Knowing the genesis of the category helps us know better where they will be most useful. Thanks, Eleanor.

  12. Sayre Greenfield Says:

    Eleanor–By the way, it has just occurred to me that the reason I received 492 hits on Hamlet while you had 446 is that U Pitt automatically factors in the hits from ECCO II among the rest.
    –Sayre

    • Eleanor Shevlin Says:

      Thanks, Sayre–I had thought that the difference might be because you have ECCO II. Although it is on the list for WCU to purchase, we have not yet done so. We have, however, adopted the new interface that allows searching between EEBO and ECCO.

      The difference between the number of hits found in ECCO alone versus ECCO I and II seems significant to me and offers a justification for purchasing ECCO II.

  13. Sayre Greenfield Says:

    Anna had suggested to me that I try my h?mlet not hamlet search through the interface at EMOB here instead of through my school, and I did, with even more strange results. In ECCO I got 80 hits (some actually useful), but these were throughout the century, so I then realized I hadn’t put in any date restrictions. I added restrictions of 1701-1720 and got 446 hits. Curioser and curioser. Nonetheless, I have hopes of that new interface.
    –Sayre

    • Eleanor Shevlin Says:

      Sayre,

      I retrieved the same 80 results–forgetting to put the date limits. The 446 hits resulted because the trial link does not include ECCO II. This number was the same that I had obtained–and WWCU does have the new interface (we just don’t have ECCO II). The trial link originally had been to the old interface, but Gale tech has now updated it to the new interface (without ECCO II)–which is why you received the 446 hits.

      The tech person is investigating this matter and will get back to us.

      As you note, Sayre, curiouser and curiouser!

  14. Anna Battigelli Says:

    This is odd. I see that the interface is different, but I also see that I can no longer properly get the 12 entries for H?mlet NOT Hamlet that I got yesterday using the older interface. Sayre, maybe Eleanor will e-mail the 12 results I sent her yesterday; since I used ECCO to forward the results, I have no copy. Perplexing…

  15. Eleanor Shevlin Says:

    Anna and Sayre,

    I never received the results you sent, Anna. Yet when I used what turned out to be the “old” interface on the trial link, I also received 12 hits. I suspect that they were the same as the twelve you received. Unfortunately now that the trial has the new interface, the search no longer works. The 446 hits you received is the same that I had received using WCU’s ECCO. This figure differs from Sayre’s because our searches did not include ECCO II.

    As I noted in my earlier post, Ron Boos, a tech person at Gale, is looking into this issue with the newinterface. He actually called methis morning, and I was able to explain the search queries we’ve been using. (He had originally been trying to help WCU adopt the new interface–and that’s when we all discovered that WCU in fact had the interface, and it was the trial link that was old. I also sent Ron the link to our blog/this exchange. He did say that it might be a problem with the indexing in the new ECCO (versus the old ECCO) rather than a technical problem with the search query, and if that’s the case, then he’ll let Scott know as well as us.

  16. Anna Battigelli Says:

    I’ll check the e-mail function again today. The e-mail function works fine, so I must have typed in an incorrect address for you.

    I will be interested in understanding this problem further. Can you say more about why this might be an indexing problem?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

Join 126 other followers

%d bloggers like this: