English Short Title Catalogue, 21st century (ESTC21): Call for Feedback

Brien Geiger, Director, CBSR and ETSC/NA, has recently sent us the following announcement and call for feedback:

Big changes are underway with the English Short Title Catalogue (ESTC), and we need your input. A union catalog and bibliography of English printing from 1473 to 1800, the ESTC has developed over the last three decades into one of the most comprehensive and authoritative bibliographies available. Yet access to ESTC data has evolved very little. Last year the ESTC was awarded a planning-grant from the Andrew W. Mellon Foundation to “redesign the project as a 21st century research tool.” For the last nine months a planning committee has discussed how to make the resource more usable to a broad spectrum of researchers and librarians and to harness the knowledge and input of those users to refine and expand ESTC data. The recommendations of that committee are now available online at the estc21 blog. The planning committee welcomes and encourages feedback on our ideas from ESTC users. The ESTC21 website with our recommendations will remain active through April 20. Please support this effort to rethink the future of the ESTC by commenting on the ESTC21 pages and taking the brief survey at the end of the website. Your feedback is critical. From the entire planning committee, thank you for your contributions to this project. Brian Geiger Director, CBSR and ESTC/NA

This entry was posted on March 20, 2012 at 11:13 pm and is filed under ESTC, Research, STC, WING. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

21 Responses to “English Short Title Catalogue, 21st century (ESTC21): Call for Feedback”

Eleanor Shevlin Says:
March 20, 2012 at 11:32 pm | Reply
The Mellon award to improve this already indespensible resource and bring it into the 21st century is welcomed news. Just a few posts ago we were discussing how difficult it can be to acquire additional funding to keep resources perceived to be “finished” as up-to-date as possible.

A special section, “Forum on Electronic Resources,” in volume 21 of the Age of Johnson (Jan. 2012) features two highly relevant essays detailing possibilities for the ESTC:
- David L. Vander Meulen, “ESTC as Foundational and Always Developing”
- Stephen Karian, “The Limitations and Possibilities of the ESTC”
LikeLike
Anna Battigelli Says:
March 27, 2012 at 9:41 pm | Reply
The ESTC blog seems well designed and suggests productive directions for ESTC’s future. In addition to the “Overview” page, there are five additional tabs: “Data,” “Searching,” “Curating,” “Future Projects,” and a “Survey.” The Overview lists three goals:

1. to harness the expertise of its users in the ongoing curation and enrichment of its data;

2. to become the “electronic hub” for relevant digitization projects and make the universal corpus of digitized early modern English works accessible to users;

3. to become a resource of new kinds of inquiry by making ESTC data more open to the wider web and more easily accessible for use in other digital projects.

These seem like good steps, and they are consistent with suggestions made by Steve Karian in his superb essay on the ESTC in Age of Johnson.

The ESTC blog is designed to collect suggestions about the ESTC’s future; it will remain up through April 20, and readers are encouraged to post suggestions there. General discussion of the new ESTC’s promising future can, of course, also take place here.

LikeLike
Eleanor Shevlin Says:
March 28, 2012 at 9:20 am | Reply
We should note, too, that the ESTC blog will remain active only until April 20. 2012, so everyone has about three weeks remaining to offer comments and suggestions.

LikeLike
Anna Battigelli Says:
March 29, 2012 at 11:29 am | Reply
I’d like to hear examples of how ESTC could use API to open ESTC for use on other sites. Which other sites? EEBO and ECCO? This seems like a helpful development, but providing tentative but specific examples will help highlight its significance.

LikeLike
Benjamin Pauley Says:
March 30, 2012 at 10:52 am | Reply
As somebody who’s on the planning committee, I can’t emphasize enough how much we’d like to get feedback from users: we’ve done our best to think about ways that the ESTC could be reimagined, but the more perspectives that are brought to bear on the question, the better. [That much I think I can say in my quasi-official capacity. From here on, it’s just me talking.]

One thing I’m especially keen to hear about are the kinds of research questions that we could imagine addressing using the information that’s in the ESTC, but which we can’t really pursue very well right now, given the ESTC’s structure and interface. I spoke to someone, for instance, who noted that she couldn’t easily use the ESTC to identify works published anonymously, because the conventions of cataloguing mean that, when we’ve figured out the authorship of a work, the author’s name gets inserted into the catalogue record (she wanted to get at works that a given author had published without any attribution on the title page). The more examples of this sort that we can get, the better we can think about how the ESTC could be changed. (Steve Karian’s essay, as Eleanor says, offers a really thoughtful critique, and I’ll need to track down the latest Age of Johnson. I read a draft, but would like to see the final version, as well as the other essays in that section).

As for Anna’s question about the the ways an API might open the ESTC to other sites, I’ll try to offer a couple of examples (to the best of my pretty basic understanding).

One simple case might involve a website accessing the ESTC’s data in what we might call “read-only” mode. If you developed a site dedicated to the life and works of, say, Thomas Middleton, you could perhaps have an embedded view of bibliographical records coming straight from the ESTC, rather than re-creating all that information on your own. Any changes to the ESTC’s coverage of Middleton would then be reflected in your site as they occured (if a bibliographer were to determine that what had been thought to be a single impression was actually two distinct ones, your site would reflect the change).

A more advanced case might involve a website that sends queries to the ESTC and presents the results to its users—what we might call “computational” access to the ESTC. There’s a lot of information to be had from the ESTC, and I don’t know that it’s possible to construct a single search interface that would let you query every bit of it in every conceivable way. My hunch is that any interface that tried to be totally comprehensive would collapse under its own complexity. But if there’s a published API, then anybody could develop a more specialized window onto that data. You could develop your own site that draws on the ESTC’s data, or perhaps users could develop specialized tools that could be shared on the ESTC’s site

Right now, for example, you can’t really search the ESTC by an author’s date of birth or death. You can search for them by name, and you can search for the year of publication of a work, but you can’t really get a list of, say, authors who were born in 1660, or who were at least 40 years old at the time of the Glorious Revolution. An ESTC API might make it possible to access that information in a way that allows you to figure out (totally off the top of my head, so I make no guarantee how sensible all of these questions might be):
- How many authors published their first work before the age of 20?
- Who are some authors who remained productive (or even grew more productive) in the last decade of their lives?
- How many poets actually display a progression up the Virgilian hierarchy of genres as they age?
- Are there any booksellers who seem to have published a conspicuous number of young authors?
- Are there any moments when the average age of authors who were publishing seems to go up or down in an unusual way (was there ever a “greying” of poetry? was there ever a conspicuously youthful enthusiasm for the elegy?)?
As I say, I don’t know whether all (or even any) of those questions would be worth asking, but the point is that an API would make it possible for people with novel questions to get at the bits of the ESTC’s data that they need without having to recreate all that data from scratch.

One final case (which might apply to a resource like ECCO or EEBO Interactions) would involve making it possible for a site’s users to “write” data to the ESTC without ever visiting the ESTC’s own site. If you’re doing work at ECCO or EEBO and run across something that needs correcting, you’re probably more likely to make the correction if you can do it without disrupting your work too much. If you have to stop what you’re doing, open another browser tab, go to another site, log in, find the right record, and then contribute your correction, that would be pretty disruptive. What I think we’d like to work towards, instead, would be a system where users’ corrections or comments could be sent to the ESTC without their ever having to leave the site where they’re working—without their even having to know, necessarily, that there is more than one site involved in the experience. I said EEBO and ECCO because they’re big resources with lots of users, but this model could certainly work with smaller, more focused sites, too: a small group of knowledgeable and committed users could well make very valuable contributions in a particular domain.

I do have to stress that this is all still in the planning phase, and it’s too soon to predict exactly what forms all of this will take. But I’m hopeful that this effort can open up new avenues for scholarship and new ways of doing scholarship.

LikeLike
- Anna Battigelli Says:
  March 31, 2012 at 4:24 pm | Reply
  Thanks, Ben. This is really helpful. Could there be something equivalent to EEBO Interactions, a space or site where users can append useful links or bibliographical or biographical notes. Or could EEBO Interactions, which already exists and is thoughtfully designed, be linked to the new ESTC?
  
  LikeLike
  - Eleanor Shevlin Says:
    March 31, 2012 at 8:43 pm
    The capacity to append additional material–the links, biographical information, and the like–would be quite useful. One way would be to supply links between various ESTC records and the relevant text in EEBO Interactions, but that would not address the problem for those 18th-century texts not in EEBO. Having an “interactions space” within ESTC would eliminate this problem, but such a box should be available for all the records. In turn, however, that creates the potential for duplication of effort in EEBO Interactions and ESTC>
    
    LikeLike
Eleanor Shevlin Says:
March 30, 2012 at 1:34 pm | Reply
Thanks so much for all these fine details, Ben.

The sample questions you offer usefully illustrate the new types of queries we might pose and also help prompt other fresh directions. For example, the inability of being able to search for anonymous authors has parallels to finding female printers.

Finally, your closing point about the ability to make corrections without leaving ECCO or EEBO seems to be an especially important point. I suspect that without such capability more than a few scholars who fully intended to go to ESTC after finishing the task at hand on the other database would end up not doing so.

LikeLike
Anna Battigelli Says:
March 31, 2012 at 6:36 pm | Reply
I’ve been interested in the anonymity problem. Would it work to insert the author in brackets and then have a search function that identified entries with either bracketed names or no name in the author field as anonymous? This leaves out the problem of initials, pseudonyms, and other authorial complexities.

I’d like to hear the solution to this problem.

LikeLike
Eleanor Shevlin Says:
March 31, 2012 at 9:03 pm | Reply
I would have to give this more thought, but at the Digital Humanities and Archives roundtable, Mike Gavin’s discussion of designing tagging that addresses the names of characters, actors, different forms/abbreviations of a single name within a given printed play, and the like seems to have some applicability here.

TEI does contain a tag for the anonymous author as well as other types of authorship:

Example
<author>British Broadcasting Corporation</author>
<author>La Fayette, Marie Madeleine Pioche de la Vergne, comtesse de (1634–1693)</author>
<author>Anonymous</author>
<author>Bill and Melinda Gates Foundation</author>
<author>
<persName>Beaumont, Francis</persName> and
<persName>John Fletcher</persName>
</author>
<author>
<orgName key=”BBC“>British Broadcasting
Corporation</orgName>: Radio 3 Network
</author>

(TEI, “Element,” author)

One could use the anonymous tag along with other tagging to indicate attribution, established author, and so forth.

LikeLike
Deborah J. Leslie Says:
April 2, 2012 at 12:59 pm | Reply
Putting an author’s name in square brackets isn’t an option, but ESTC records for books published anonymously include a note. An advanced search for “anonymous?” in Notes retrieves 25,834 records; most if not all of those will be for works published anonymously. So maybe the anonymous thing isn’t a good example of something a new ESTC could do that can’t be done now.

LikeLike
Anna Battigelli Says:
April 2, 2012 at 1:12 pm | Reply
Deborah: I searched for “anonymous” 1680-1682, and Dryden’s Absalom and Achitophel popped up, as you suggested it would.

You are clearly suggesting that searching for anonymous records is not a problem with the current ESTC. Are there other problems that the new ESTC should consider?

LikeLike
Deborah J. Leslie Says:
April 2, 2012 at 2:02 pm | Reply
Good, Anna, I’m glad. Let me hasten to add for others’ benefit that the search term should have a ‘?’ at the end, to allow for retrieval of “anonymously.”

We at the Folger are preparing an institutional response to the survey, but I will mention one thing here: a social space for discussion of authors, works, editions/issues, and individual copies. That would require a mechanism to allow such discussions to be associated not only with individual records, but to multiples of records.

As Eleanor mentions, a lot could be done with tagged elements; I would like to see gender identification for authors and other associated names. Of course, that information would have to be in the records first (or in linked authority files– in case there are other librarians reading this) in order to be manipulated and retrieved.

LikeLike
- Benjamin Pauley Says:
  April 2, 2012 at 3:32 pm | Reply
  One thing that came up in our discussions was how there were opportunities to take better advantage of some things that are actually in the MARC specification, but aren’t as fully exploited in the current ESTC as perhaps they could be. One example is uniform titles (something that could help to do at least part of the linking of multiple records that Deborah notes would be necessary—though there would need to be a way of sorting out whether a comment applied to every manifestation of a work, or only to certain ones). Another is the use of the kinds of linked authority files Deborah also mentions, which would (among other things) allow for noting a person’s gender.
  
  Now, from what I can see (there may be something I’m missing), things like the gender field aren’t always fully exploited yet in the Library of Congress’s own authority records. Henry Fielding has plenty of pseudonyms, for example, but no gender, while his sister, Sarah, has a gender, but it’s tagged as the 370|a field, which, if I’m reading correctly, the MARC specification for authority records indicates is for place of birth, rather than as 375|a, which is for gender. (Again, I could be missing something here.) But this is clearly a really a promising direction to go, because it means that, given an LC number (or a VIAF number) you could reliably link to or draw out anything that the ESTC had about a person.
  
  And working within a framework like that one opens the way for lots of interesting stuff. You can, for example (I love this) include more than one gender field, with dates attached, so you could track people whose genders changed over the course of their lifetimes. Perhaps that’s not so pressing an issue for people before 1801, but it’s a fascinating prospect, nonetheless. More practically, for this period, if I understand the MARC specification correctly, you can similarly add multiple occupations and multiple addresses, each with date ranges. So you could imagine following a printer from apprenticeship to mastery, or noting the various addresses he or she used over the course of his or her career. It could be quite a lot of work to flesh all this information out, but it’s work that, once done, could be shared.
  
  For me, one of the most interesting exercises has been to try to think about how a new ESTC could best straddle the worlds of library cataloging and of bibliography. How might a new ESTC both leverage and contribute to all the work that’s being done in the realm of library cataloging, while also making its data available in ways that might not always correspond to the ways library catalogs are typically used? How could a new ESTC solicit information from its users that could help to improve and enrich cataloging information without demanding that users learn the ins and outs of MARC before contributing something useful? I’m looking forward to hearing about the feedback that the survey gathers.
  
  LikeLike
  - Deborah J. Leslie Says:
    April 2, 2012 at 4:44 pm
    A new cataloging code for the English-speaking world, Resource Description and Access (RDA) has recently been adopted after a test phase and is beginning to be implemented. Corresponding changes to MARC (for the non-librarians: MAchine-Readable Cataloging, the encoding protocol for bibliographic and authority records) to accommodate RDA have been adopted. The authority record for Sarah Fielding was updated to RDA in September of last year. That is why her record has information like gender, associated language, and such, while her brother and more than 99% of authority records do not.
    
    LikeLike
Benjamin Pauley Says:
April 2, 2012 at 5:04 pm | Reply
[WordPress apparently only supports replies two layers deep. This is in response to Deborah Leslie’s comment about RDA.]
Ah. Thank you very much for that explanation. That helps me to click one of those puzzle pieces into place that I’d heard something about, but didn’t grasp at the time. As somebody who’s not a librarian, I have a lot of those, and find myself in a perpetual game of catch-up.

LikeLike
Deborah J. Leslie Says:
April 2, 2012 at 5:13 pm | Reply
I adhere to the “successive iteration” theory of learning new things. Glad to know that my iteration helped things click!

LikeLike
Eleanor Shevlin Says:
April 2, 2012 at 7:07 pm | Reply
Thanks for the news about the RDA, Leslie; this development is yet another piece of promising news. Being able to search for occupation (including multiple occupations), gender identity, and the like would be real pluses.

I agree with Ben that thinking about “how a new ESTC could best straddle the worlds of library cataloging and of bibliography” is an extremely interesting line of thought, and would add only another world–that of book historians. The notion of straddling multiple worlds also recalls a few of Robin Alston’s hopes for the ESTC in his account of its history:

It would adopt computer technology to transform traditional methods of compiling catalogues and so encourage other national libraries to recognise the needs of researchers for flexible access to historical sources.

It would serve as a nursery for training young staff in a sound and varied knowledge of the collections, … as well as in the possibilities provided by machine-readable records for advancing scholarship.

It would bring to light many thousands of items never previously catalogued, …
(“The History of ESTC,” The Age of Johnson: A Scholarly Annual 15 (2004), 269-329).

One can explore the first set of RDA vocabularies, released this past August, at the Open Metadata Registry.

LikeLike
Anna Battigelli Says:
April 3, 2012 at 8:21 am | Reply
Drawing on David Vander Meulen’s claim that the ESTC is always developing, the new ESTC will need to allow for that development. Along those lines, will the ESTC allow for links to DPLA items?

LikeLike
Eleanor Shevlin Says:
April 3, 2012 at 9:26 am | Reply
For an update on the Digital Public Library of America (DPLA), see Anna’s latest post.

I wonder if plans are perhaps underway to link items to digital copies made by libraries that hold their originals.

On a different note, having a space dedicated to exchange of information seems especially important.

LikeLike
Brian Geiger Says:
April 6, 2012 at 7:56 pm | Reply
Thanks for the info on RDA, Deborah. I have a lot to learn about the standard. Our approach with the redesign has been to try to imagine all the kinds of information we might want to collect and the various ways in which users, both people and machines, might want to access that information, create a database blueprint that can accommodate the complexities of the data and use cases, and then map that database to a cataloging standard or standards. I wonder whether or how the eventual adoption of RDA might change that approach.

Take for instance the issue of false imprints. Earlier this week I was contacted by an Italian library that they had identified 5 ESTC items (also in ECCO) that they were able to attribute to “the typography Agnelli of Lugano (Switzerland).” The ESTC numbers are T195986, T151607, N52461, T201505, N3464, T26164 (note the first two are different editions of the same work; the second edition being one the Folger identified). The ESTC notes that these are Italian false imprints.

There are at two limitations I see with the way false imprints are recorded in MARC, and presumably RDA, though perhaps not. T151607 notes “The imprint is false; probably printed in Italy” and N52461 that “The imprint may be false; printed in Italy?.” These notations might make sense to a human reader, but it would be quite tricky, and probably not very reliable, to create a computer script that could consistently and accurately makes sense of “probably” and “may be.” If I were to start from scratch I would design specific database entries for the particular bits of information about false imprints that I wanted to collect (for ex, yes/no/maybe, country, city, etc) and then map these entries to a MARC field that could be made human readable.

The second limitation I see is that though I would like to change the attribution to “Angelli” and “Switzerland,” I would prefer not to lose the original guess of “probably Italy.” At some point a researcher might want to query records that had at one time been assigned Italian false imprints but are now Swiss. MARC, I would suggest, is a flat standard designed to record the outcome of research but not the research itself. It might be possible to “version” MARC, but given the choice I suspect most people designing a cataloging system with versioning would not start with MARC. The bit of “versioning” that we have done in the ESTC to date has been awkwardly placed in the annotation or 500 fields, where it is often difficult for humans or machines to make sense of.

LikeLike