The Big Bundle Steal: Open Access and Subscription Databases


In previous posts we have addressed access to subscription databases. Several recent news items offer a timely reason to revisit the subject.

On July 19, 2011 the New York Times reported the indictment of twenty-four-year-old Aaron Swartz, one of the forces behind the Open Library Project and longstanding internet activist, for breaking into the JStor database and downloading over 4.8 million articles. The material downloaded represented close to JStor’s entire holdings. On several occasions Swartz’s surreptitious activity caused some of JStor’s servers to crash. Once Swartz returned all the hard drives containing the documents and promised not to redistribute them, the nonprofit online journal provider expressed no interest in pursuing legal action. JStor issued its statement about the case the same day that the New York Times’s article appeared. The return of the material to JStor, however, did not alter the government’s stance toward the situation. As United States attorney, Carmen M. Ortiz, noted, “Stealing is stealing, whether you use a computer command or a crowbar, and whether you take documents, data or dollars. It is equally harmful to the victim whether you sell what you have stolen or give it away.” If convicted, Swartz could face a 35-year prison sentence and a fine of one million dollars.

Swartz undertook the downloading while he was on a ten-month fellowship at Harvard’s Edmond J. Safra Center for Ethics—a situation some may view as ironic and others as indicative of his strong commitment to open access for all information. Many have viewed the charges as over-reaching by the government, with some regarding the indictment as retaliatory for his internet activism as well as indicative of a lack of understanding about the digital environment. How could Swartz’s actions be called “theft” when the material remained on the JStor server? Isn’t this a matter of simply copying files? At the most wouldn’t his actions more appropriately fall under copyright infringement (although this scenario is complicated by the unwillingness of JStor to press charges)? That Swartz had legitimate access to JStor through his position at Harvard and that he conducted the downloading through unauthorized access of M.I.T.’s servers raise additional questions about illegal actions as well as his motivation. Was his purpose to analyze large data sets as he has in the past? (An explanation many have suggested, yet, as Michael Widner ponders in his recent ”JStor, the Semantic Web, and Bibliopedia” post, what aim could Swartz possibly have had in downloading so much content “that could not have been served via JSTOR’s Data for Research (DfR) API”?) Or did Swartz intend to make the JStor content freely available as the indictment, without citing any evidence, asserts?

A number of blog posts and other online articles have offered opinions about and analysis of the case, but Maria Bustillos’s post on The AWL (3 August 2011) offers a well-balanced look at the case. Besides providing valuable legal context and rejecting the possibility that Swartz would want or need to use JStor’s Data for Research for data analysis, she also helpfully distinguishes between nonprofit journal providers such as JStor and for-profit commercial enterprises such as Reed Elsevier. As Bustillos reminds us, costs, often significant, are involved in digitizing academic journals and maintaining reliable access to them. She also usefully clarifies that

JSTOR is paid (not by the public, but by institutions) for a service, not for content. The money that individuals pay for these articles goes not to JSTOR, but to the publisher that is making the material available.

I would add that a portion of the modest fees obtained by some scholarly societies from licensing their copyrighted journal and annual publications to JStor help defray their publishing costs, keep members’ dues down, and even on occasion provide funds for graduate student travel and scholarships. Bustillos rightly notes that JStor offers free access to nonprofit entities across Africa and other parts of the developing world. On its website JStor states that it furnishes over 600 institutions in the developing world with free access; its factsheet supplies some specifics and also announces the launch of a new alumni-access program. This announcement seems a welcome development and suggests that JStor is responding to a consistent complaint about access being withdrawn upon graduation from an institution.

Yet I should stress that this EMOB post is not meant as a paean to JStor. Like Bustillos, I believe that JStor should arrange for free access to the public domain material it provides. Rather this post is an effort to draw attention to the distinctions between nonprofit providers and for-profit providers of online materials. Indeed, I second Bustillos’s hope that “[i]f the Aaron Swartz case clarifies the position of open access advocates with respect to nonprofit services like JSTOR, that at least will be a good thing.”

JStor provides access through bundling—the practice of selling online access by assembling a number of journal titles as a package that libraries purchase for their patrons. Yet not all bundlers are alike, and it does seem odd to me that Swartz decided to copy JStor’s collections as opposed to those of other bundlers. In a 2008 piece, “Collection Sales: Good or Bad for Journals?”, Mark Armstrong evaluates the practice of bundling in terms of its effects on the journals (as opposed to readers). While he sees bundling as positive practice for journals, he does not see bundling by commercial publishers as salutary for nonprofit journals and interestingly sees JStor as a possible model that nonprofit journals may wish to emulate:

In the historic regime of stand-alone journal sales there was little tension when a nonprofit journal was published by a highly commercial publisher. Now, though, there is a tension, and non-profit journals might benefit from gradually disentangling themselves from the more commercial publishers. Both for-profit and non-profit journals, however, should surely make full use of the powerful instrument of bundling. In particular, relative to any stand-alone sales strategy, a non-profit journal will be better off if it joins the collection sales programme of a noncommercial publisher. The current example of JSTOR, which distributes a collection of largely non-profit journals (with a lag), might be a possible guide for how to disseminate the current output of non-profit journals. (21)

In the footnote that closes this paragraph Armstrong further explains,

JSTOR ( distributes a large number of journals from several disciplines, with a preponderance of non-profit and society journals. Articles are distributed with a lag of several years, so as not to unduly cannibalize a journal’s library subscriptions. JSTOR is available to libraries on a bundled basis (with scope for libraries to choose only particular subject areas). Since JSTOR has always distributed collections rather than individual journals, there is no issue about basing its library prices on historical individual subscriptions. JSTOR is a non-profit enterprise and sets relatively low prices to libraries, and pays relatively low prices to participating journals for distribution rights.

Ted (Theodore C.) Bergstrom, an economics professor at the University of California Santa Barbara, has been studying academic journal pricing for more than two decades, and his work supplies an additional context for assessing the differences between for-profit and nonprofit journals and the practice of bundling. In a co-authored 2006 article that appeared in ”Frontiers in Ecology and the Environment” [4.9 (Nov. 2006): 488-495], he and Carl Bergstrom compare “The Economics of Ecology Journals,” and their findings seem to be in keeping with larger trends. One of the several graphs they furnish affords a visual view of the stark pricing differences and citation costs of journals published by for-profit entities, by commercial and scholarly partnerships, and by nonprofit publishers (p. 489).

Figure 2. For-profit publishers (yellow) charge institutions more per page for journal subscriptions than do non-profit publishers (blue), such as scholarly societies and university presses. Journals published jointly by scholarly societies and for-profit publishers (red) are typically priced intermediately. Solid lines indicate linear regression through the origin (Non-profit: slope = 0.25, r2=0.67. Joint: slope = 0.86, r2=0.83. For-profit: slope=l38,^=0.90. Note that the regression slopes are not equal to average prices per pages, because the latter effectively weights each journal by its size.) From Carl T. Bergstrom and Theodore C. Bergstrom, Frontiers in Ecology and the Environment, 4.9 (Nov. 2006): 488-495, p.489.

As the authors explain, this chart shows that the

average cost per recent citation for journals published by non-profit societies is $0.78. Journals published jointly between a scholarly society and a for-profit publisher cost on average $2.42 per recent citation and those produced by for-profit publishers without an affiliated society cost on average $433 per recent citation (Figure 2). Thus, whether we measure cost as price per page or price per citation, for-profit journals are approximately five times as expensive as their non-profit counterparts.

Given that one of their findings reveals that the “price differences between commercial and non-profit publications do not reflect an underlying difference in quality as measured by citation rate” (488), the cost-per-citation analysis should cause serious pause.

Bergstrom’s ongoing work with other collaborators in the Big Deal Contract Project charts the stranglehold that for-profit outfits such as Elsevier and Springer have held over academic libraries. Also notable are the widely different fees that libraries might pay from one of these providers for the same material. A posting on a 2009 talk Bergstrom gave at the University of Michigan reports that “as an aside, almost, [Bergstrom] tells us that while UMich and Illinois pay Elsevier about $2.25M for the “Freedom Collection”, Wisconsin pays about $1.2M for the exact same collection.”

Bergstrom has consistently concluded his articles with suggestions of how to address the various problems stemming from commercial publishers’ bundling practices. In the previously cited 2006 ecology article, he and his co-author stress that faculty should not abdicate purchasing decisions to librarians, noting that librarians depend on faculty and graduate student input in making such decisions. They forcefully conclude

The fraction of library budgets that is currently going to the shareholders of large commercial publishers could instead be used to provide services of genuine value to the academic community. Professional societies and university presses could help by expanding their existing journals or starting new ones. Individual scholars could advance this process in many ways: by contributing their time and efforts to the expansion of these non-profit journals, by refusing to do unpaid referee work for overpriced commercial publications, by self-archiving their papers in preprint archives or institutional repositories, and by favoring reasonably priced journals with their submissions. (495)

Over the years Bergstrom has adjusted his recommendations to address the shifting landscape effected by the ever-increasing move to digital access. In his 2010 essay “Librarians and the Terrible Fix: Economics of the Big Deal” essay, published in Serials, 23.2 (July 2010): 77-82, Bergstrom presents an array of possibilities. In this piece he now sees merit in the “sale of institutional site licenses for non-profit journals” (80). As he explains, unlike commercial entities that have made the most of price inelasticity [that is, when an increased price “results in a less than proportionate decrease in demand”—a situation Bergstrom describes as “a paradise for monopolists” (77)], “[n]on-profit institutions have no incentive to charge prices significantly higher than average costs, even if demand is price inelastic” (80). Moreover, he emphasizes that libraries should engage in hard bargaining with commercial publishers who charge prices that far exceed average costs. Nor should libraries be afraid of abandoning entirely the “big deal” if not satisfied with the price. Although he acknowledges that libraries should make their own decisions about the refusing the big deal, Bergstrom encourages such action by offering successful examples. Stanford’s and Cal Tech’s decisions to forego overpriced big deals exemplify prestigious research institutions who have benefitted from this stance.

Evidently more libraries are doing just that. In “Libraries Abandon Expensive ‘Big Deal’ Subscription Packages to Multiple Journals” (The Chronicle of Higher Education, 17 July 2011), Jennifer Howard reports on efforts by the University of Oregon library and other Oregon academic libraries as well as those by Southern Illinois University at Carbondale to abandon or fiercely renegotiate the “big bundle deals” offered by commercial publishers such as Elsevier, Wiley, and Blackwell and to return to purchasing individual subscriptions.

As this discussion has indicated, the big bundling deals that have received the primary criticism are those provided by commercial publishers such as Elsevier, Springer, and the like. As Bergstrom observes, “[A] library that signed its first big deal contract [with Elsevier] in 1999 would be paying 80% more in 2009 than it did in 1999” (79). Moreover, while Elsevier and Springer increased their 2010 subscription rates in 2009, numerous non-profit societies either froze or reduced their prices in response to widespread library budget reductions (79). Especially in light of ongoing financial cuts, that almost half of university library serial budgets is spent on these big bundle deals should concern us all. Open access is an ideal for which we should strive, but JStor does not seem to be the kind of villain in this larger narrative of access and bundling that some responders to Swartz’s case have painted it as being. According to JStor’s factsheet, “Since our launch in 1997, JSTOR has never raised archival collection fees for participating institutions. In addition, because we have added new content to each collection every year, users have access to more content for the same fee.” Frankly, I am puzzled by Swartz’s choice of bundle.

8 Responses to “The Big Bundle Steal: Open Access and Subscription Databases”

  1. Anna Battigelli Says:

    Thanks, Eleanor, for this illuminating post, which raises a long list of important questions, far beyond Swartz’s action. I was interested in Howard’s piece on librarians pushing back against bundled subscriptions.

    It would be helpful to have a clearer sense of the actual costs of digitizing material. That information might help us distinguish more clearly between fair and unreasonable subscription costs. The role of professional organizations in monitoring and even reducing such costs seems particularly promising. Finally, is open access an actual possibility or is it an unrealistic and unrealizable expectation?


  2. Eleanor Shevlin Says:

    Costs for digitization are evidently difficult to determine, in part because one needs to take into account so many variables. A classic report on the issue, The Price of Digitization: New Cost Models for Cultural and Educational Institutions dates from April 2003 but is still worth a look. It supplies a summary of the talks given at an April 8, 2003 one-day conference on this issue sponsored by NYU and the New York Public Library. Among the general points that remain relevant is the reminder that costs and price are not one and the same and that costs need to be figured in terms of both direct and indirect expenses and savings. For example, a library that provides only electronic access to journal articles saves on indirect costs for new shelf space and the accompanying costs (additional costs for lighting, heating, etc.) entailed in housing new shelf space.

    Donald Waters’s talk, “The Economics of Digitizing Library And Other Cultural Materials: A Perspective from the Mellon Foundation,” stressed that “the process of ‘digitization’ must be understood in order to accurately understand cost issues. As the field matures, we realize that digitization is not a uniform process, and that digital interoperability is neither simple nor straightforward.” Waters compares digitizers as operating in situations akin to the uncertain, unstable environments of early printers as discussed by Adrian Johns in his The Nature of the Book: Print and Knowledge in the Making.

    Among the three cost categories that Waters addresses is “Intellectual property costs” :

    The temptation is either to despair of the cost and abandon digitization, or to try to operate under the radar of the “copyright police”. Such initiatives avoid lawsuits, but result in costly duplication of effort across many campuses. … Projects attempting to address this issue [duplication of digitization efforts across individual campuses] include ARTstor, JSTOR, CIAO, ACLS’s History-E project, the BiblioVault project at the University of Chicago, and the Electronic Enlightenment at Oxford University. Such projects demonstrate that communities of users and publishers can find ways to create the trust and goodwill needed to overcome the costly barriers of copyright and create highly useful digitized collections of research and educational materials.

    I would also suggest looking at Bergstrom’s articles for obtaining a better sense of what is involved in costs. In several articles he also notes that scientific articles appearing in many of the most expensive journals are often simultaneously among those articles that can be most often found for free on the internet, whether available on the authors’ website (often through prior permission of the publisher–Elsevier evidently allows this) or elsewhere.

    Although Bustillos does supply specifics, her general remarks about costs involved in hosting academic journals are worth reproducing:

    A lot of people seem to believe that it doesn’t cost anything to make documents available online, but that is absolutely not so. Yes, you can digitize an academic journal and put it online, but if you mean to offer reliable, permanent availability, it costs a huge amount of money just to keep up with the entropy. Plus you have to index the material to make it searchable, not a small job. Everything has to be backed up. When a hard drive fries, when servers or database software become obsolete or break down, when new anti-virus software is required, all this stuff requires a stable and permanent infrastructure and that does not come cheap. Finally, the more traffic you have, the more it costs to maintain fast, uninterrupted server access; you can see this whenever some little blog is mentioned in a newspaper and its server crashes five seconds later. In the case of JSTOR you are looking at many millions of hits every month, and they can’t afford any mistakes.

    Finally, I would not say that open access is unrealistic, yet it is unrealistic to ignore that open access entails costs. In a post a year ago or so I mentioned a survey conducted by an academic, nonprofit journal publisher that I took. More than a few of the questions posed sought to assess who should incur these costs–an author who pays a fee to submit an article? an author whose article is accepted (at a much higher fee than one charged if monies are collected through the submission mode) and agrees to have it published? should the institution to which the scholar belong pay such fees (and who then pays the fee if one is an independent scholar—a question not asked by the survey )? Such pricing possibilities are also described and discussed in articles on this subject.


  3. Anna Battigelli Says:

    Very interesting material, both in your response and in the links you provide. Perhaps we need a larger discussion on the cost of digitization.


  4. Eleanor Shevlin Says:

    Or perhaps several discussions based on subtopics.

    An interesting development in terms of JStor (but one I did not include in the post because it was off-topic) is its e-book publishing initiative that will be launched in mid-2012. So far twenty-two publishers have signed on to become a part of Books at JStor, including Harvard, Yale, and Edinburgh University Presses.

    Evidently, books published by these presses will be available directly through the JStor interface.

    For more on this initiative, see this January 2011 news announcement. Among the projected benefits of this initiative is the cross-searchable features it will afford:

    The books will be deeply integrated with the 1,600 current and archival journals on JSTOR, as well as the diverse primary sources available today. All the content will be cross-searchable, and the books will be linked with the more than 2 million book reviews and hundreds of thousands of books references in the journal literature. Works written by the same authors or focused on the same topics, regardless of format, will be connected, and alerting services for users will cross publishers, other content providers, and content formats.


  5. Dave Mazella Says:

    FWIW, I love the idea of the e-publishing initiative. Edinburgh has some terrific books that are prohibitively expensive, and not likely to be picked up my library.

    In general, I, too, thought that Swartz’s choice of target was odd, because I think JSTOR has been one of the better players in this whole area. But I think that some of these bundlers are pricing their materials out of circulation.


  6. Eleanor Shevlin Says:

    Yes, I thought the JStor book initiative was a potentially promising development for the reason you mention–and also for the ability to search across monographs, reviews, articles, and so forth.

    Bergstrom specifically discusses how commercial outfits have been able to raise and raise and raise gain their prices and not suffer consequences. Instead, their actions have caused the dropping of non-bundled, nonprofit journals and often prevented subscribing to new independent journals. As mentioned, one of his solutions is to drop the big bundle deal.


    • Anna Battigelli Says:

      Can you say more about the JSTOR book initiative?


      • Eleanor Shevlin Says:

        Books at JSTOR, as noted above, won’t be available until mid-2012.

        The JSTOR’s page for books
        indicates that about 15,000 titles will be available then. Evidently, the initiative is being seen as a promising development for scholarly publishing through its fostering of working relationships between publishers and librarians.

        On its Books page, JSTOR describes its mission in the following terms:

        Our driving focus is to deliver great content and experiences to researchers–an objective shared by publishers and libraries. Books will be cross-searchable with the millions of journal articles and primary sources on JSTOR, linked through our vast network of citations, contextualized by more than 1 million book reviews on the platform, and provide an online research experience marked by personalized, user-driven functionality.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: