British Newspaper Archive: Not Burney (yet), But Still Useful

by

Launched this past November, the British Newspaper Archives is a joint project of the British Library and brightsolid online publishing. Over the next decade, this partnership is slated to digitize over 40 million pages of the BL’s newspaper collection. The site anticipates a wide audience that includes not only scholars but amateur historians, genealogists and more.

While the project often digitizes the original paper copies, it has also digitized from the BL’s microfilm copies because the process is faster and enables more pages to be made available in a shorter amount of time. The quality of the pages, however, does suffer as the website admits; unfortunately, this emphasis on speed means that the accuracy of the search results is forever sacrificed. That said, one can view the OCR text and correct it:

When viewing an image, the OCR text can be viewed via the left nav All Articles option. You can select an individual article and then select Show Article text and the text. This addictive option can be accessed by simply clicking the list of sections displayed and applying your own corrections. By correcting the text, you will be adding to the quality of the data that can be searched by others. Please note that during the launch period updates to corrections will take longer to appear. (“Getting Started”)

The site’s descriptive information suggests that the collection dates primarily from the nineteenth century on, but there are 24 eighteenth-century provincial newspaper titles available in the current collection (full list appears below). As of yet, there are no eighteenth-century London papers. Like Burney, the British Newspaper Archives is a subscription database. Unlike Burney, though, provisions for individual subscriptions exist. The rates also seem quite reasonable and offer an array of plans (credits refer to the number of views; each view “costs” 5 credits; the view option enables you to download or printing):

  • 12 Month Package (unlimited pages)
    Price: £79.95 GBP,       Valid For: 365 days       Credits: Unlimited*
  • 30 Day Package (up to 600 pages)
    Price: £29.95 GBP       Valid For: 30 days      Credits: 3000
  • 7 Day Package (up to 120 pages)
    Price: £9.95 GBP       Valid For: 7 days       Credits: 600
  • 2 Day Package (up to 100 pages)
    Price: £6.95 GBP       Valid For: 2 days      Credits: 500
  • Potential users are also able to register using an email address and receive 15 free credits—-a very limited trial of sorts.

    Searches can be conducted either as simple or advanced. The advanced search includes searching by “All of these words,” “Any of these words,” “Without these words,” and “Phrase.” You also have the option of applying filters such as dates, place of publication, publication title, or article type (advertisement, article, family notice, illustrated, miscellaneous). There is also the option of browsing by titles. Unfortunately, you cannot use the wildcard characters to help counteract the poor OCR, long “f,” or other typographical peculiarities that the Burney search interface provides. Nor does the BNA offer features similar to Burney’s search aids such as the “w” or “n” joined by a number to find two terms within a certain proximity of one another.

    Search results can be ordered by relevance or by date (either by ascending or descending order), and a glimpse of the context in which the search term results occur are given. For example,

    Ipswich Journal
    Sat 10 Jan 1784 Suffolk, England
    5 U F F O L K. 1 0 be Ll’. TT, ant! enlered upon immediatrly, THAT olil*aeculbmed Public Honic,
    8343 Words
    “SfMON PATERNOSTER of Wickhairi-market, to be agent for the faitl company for the town of Wiek- ham.market, and parts adjacent. The company infure lioufeS, bufrdings, … ?

    As this example demonstrates, the context provides the OCR text with all its warts. Still, it helps the user decide if the article is worth viewing and assists in conserving the credits in one’s account.

    CAVEAT: During my two-day exploration of the BNA, I encountered several cases in which I clicked to view an article only to discover the article was not on that page. Five credits were still deducted from my account and continued to be deducted as I browsed other pages in the issue. Once or twice I was not able to find the result at all; other times it appeared on a different page within that issue.

    Here is a list of eighteenth-century titles currently available in BNA:

    • Aberdeen Journal (105)
    • Bath Chronicle and Weekly Gazette (1989)
    • Birmingham Gazette (8)
    • Bristol Mercury (1)
    • Caledonian Mercury (7309)
    • Chelmsford Chronicle (329)
    • Derby Mercury (2595)
    • Hampshire Chronicle (1115)
    • Hampshire Telegraph (11)
    • Hereford Journal (982)
    • Ipswich Journal (1203)
    • Ipswich Journal, The (1296)
    • Kentish Gazette (374)
    • Leeds Intelligencer (2352)
    • Manchester Mercury (223)
    • Newcastle Courant (2561)
    • Norfolk Chronicle (1109)
    • Northampton Mercury (1625)
    • Oxford Journal (2434)
    • Reading Mercury (570)
    • Salisbury and Winchester Journal (17)
    • Scots Magazine, The (611)
    • Sherborne Mercury (256)
    • Sussex Advertiser (60)

13 Responses to “British Newspaper Archive: Not Burney (yet), But Still Useful”

  1. Dave Mazella Says:

    Hi Eleanor,

    It’s very encouraging that a database like this would finally offer the possibility of brief, individual subscriptions. That makes a lot of sense for a database that looks most useful for fairly specific projects. But then the ease of use becomes critical; no one wants to “spend” credits making the thing work.

    The DIY ethos of the OCR corrections is intriguing, but it also means that it will require a lot of use for this to have an impact.

    It also makes me think that the future of such databases may lie in smaller, discrete collections rather than the big ECCO style collections that we currently have.

    Like

  2. Eleanor Shevlin Says:

    I , too, was encouraged by the individual subscriptions. The DIY approach to correcting OCR seems increasingly popular; 18thConnect promotes a version of this.

    I should stress that the glitches I described were not deal-breakers at all, and the contextual display of results helped conserve credits. Burney has many of the same problems with OCR–with the same effect on returned results. The key difference there seems to be more flexibility with search options.

    Also, this collection will be quite a large one–it is just small in respect to the eighteenth-century titles. (Its potential to be quite large is one reason I used the parenthetical “yet” in the post’s title.)

    Like

  3. Dwight Codr Says:

    Thanks for your report; it made me want to give it a test drive. Enjoyable and of obvious utility. While we’re hoping for more, better, etc., might I add that I do wish that databases for newspapers — including the Burney collection, which I’ve used some — would add more meta-information about the periodicals covered. I’m (sort of) sure I could track down who was running/working for the Newcastle Courant in the 1720s, for example, but it would be great if there were links that could direct me to brief reports on the paper’s history, its political affiliations (when appropriate), and distribution statistics.

    Like

  4. Eleanor Shevlin Says:

    Thanks, Dwight. And, yes, most users would like to see both better metadata and more historical information about various titles included in the collection. This desire is one of several reasons that the constructions of such digital archives would benefit from having more scholars involved at the onset. Yet, it should also be noted that we still lack much information about many of these titles. The presence of these databases, it is hoped, will encourage more study.

    Some might be interested in reading James E. Tierney’s “The State of Electronic Resources for the Study of Eighteenth-Century British Periodicals: The Role of Scholars, Librarians, and Commercial Vendors” that appeared in The Age of Johnson, Vol. 21 (2012).

    Like

  5. Anna Battigelli Says:

    Thanks for this, Eleanor. I like that the searching is free, though the OCR, as you point out, seems deeply flawed. Individual subscriptions are great, too, though if the database included London newspapers, the deal would be even more attractive.

    Dwight is absolutely correct. We need metadata. Tierney’s article is illuminating, as Eleanor points out.

    Like

  6. Eleanor Shevlin Says:

    I suspect that what we are seeing here with the OCR is akin to that for Burney. From 1800 on London newspapers are available in the collection. Because this digitization projet is ongoing, with countless pages being added regularly, I wonder if evenetually there will be 18th-century London newspapers available. The existence of Burney, also a BL co-partnered project, might mean no.

    Like

  7. Anna Battigelli Says:

    One of the BNA returns for my search for “nuns” in 1790 included a section on treating bites from mad dogs. OCR translated advice recommending consultation with a surgeon for all wounds “where the skin is injured” into consultation for all cases “where the nun is injured” (BNA Norfolk Chronicle, 9 Jan 1790).

    Additionally, though searching is free, the OCR snippets for each entry often contain too much gibberish to allow a cogent evaluation of the entry’s content. This matters when one must pay to view the actual content. Finally, navigation within entries is clumsy.

    Still, it was fairly readable, and personal subscriptions make it accessible, which is huge.

    Like

  8. Eleanor Shevlin Says:

    I have noticed that some titles seem to have far more OCR problems than others, due no doubt to the quality of the original and/or the microfilm.

    Anna is right about some results appearing with such gibberish that it is hard to dicern whether the hit is relevant. This dififuclty also appears tied to one’s search and the terms one is using. “Nuns” is far more generic a search than if someone was looking for a particular proper name, specific location, publication title, or the like. Also helpful when one encounters gibberish is the similarity of advertisements and news items across papers. If a result is hard to decipher in one title, it may be legible in the next result. Of course, some items are specific only to a single title, so this tip would not work in these cases.

    Again, Burney almost certainly has many of the same OCR issues; its results, however, do not advertise the problem.

    Like

  9. Anna Battigelli Says:

    In addition to Eleanor’s suggestion of more specific search terms, we might keep in mind Laura Mandell’s warning about OCR and the long “s” and, where possible, avoid search terms that start or end in “s.”

    That said, “nuns” did return plenty of useful results.

    Like

  10. Eleanor Shevlin Says:

    Sayre Greenfield’s search tips for ECCO’s OCR problems often have relevancy for Burney and BNA searches; his troubleshooting advice appears under EMOB’s pages.

    Like

    • Anna Battigelli Says:

      I’d like to hear from Laura Mandell about the crowd-source correcting mechanism in 18thConnect’s type-write program. It had some bugs when I tried it, but it seems quite promising.

      This is important because we need an OCR solution if digital searching is to fulfill its promise.

      Like

  11. Eleanor Shevlin Says:

    Since reading Cameron Blevin’s post “Coding a Middle Ground: ImageGrid” about a week ago, I’ve almost mentioned it several times in a comment here. It has been receiving notice in a host of blogs and on Twitter, including being Digital Humanities Now‘s Editor’s Choice pick yesterday. As the post’s title may suggest, Blevins discusses ImageGird, a program he and his colleague Bridget Baird devised, that offers a middle ground between close and distance reading. Nineteenth-century Texas newspapers serve as his example.

    Like

  12. Newton Key: Crowdsourcing the Early Modern Blogosphere - historyblogosphere - Bloggen in den Geschichtswissenschaften. Ein Open Peer Review-Buchprojekt Says:

    […] Not Burney (yet), But Still Useful blog post 6.7.2012, in: Early Modern Online Bibliography [https://earlymodernonlinebib.wordpress.com/2012/07/06/british-newspaper-archives-not-burney-yet-but-s…], accessed 11/10/2012. 118 0 Alexandra Shepard and Phil Withington (eds.): Communities in Early […]

    Like

Leave a comment