Author Archives: Larry E Hibbler

Orange circular lock shown in "unlocked" position - the Open Access logo.

The State of Scholarly Publishing

For folks interested in the current state of scholarly publishing, especially regarding Open Access, there are two recent reports that do a great job of summarizing publishing’s move toward OA. 

In November, the White House Office of Science and Technology Policy (OSTP) released its “Report to the U.S. Congress on Financing Mechanisms for Open Access Publishing of Federally Funded Research.” This report, required by a 2023 appropriations Act, describes the different business models currently being used to comply with the requirement of public access within a year of publication (remembering that the U.S. government uses the term “public access” to denote free-to-read access, and not any of the other rights OA implies). It also provides top-level statistics about the rapid growth in OA publishing over the last ten years.

The most interesting takeaway is how difficult it is to estimate how much federally funded researchers paid to publish in the last few years. Even the U.S. government has very limited data. The best guess from OSTP was slightly more than $378 million in 2021, a 39% increase from 2016. The other highlight of the report is the Appendix, which describes the economic concepts related to publishing that can be used to analyze the system.

Also in November, a group of faculty and staff from the Massachusetts Institute of Technology released the report “Access to Science and Scholarship: Key Questions about the Future of Research Publishing.” Much like the OSTP report, it spends most of its time discussing the recent history of publishing, highlighting growth in both scholarly outputs and in spending. There is more detail here on specific publishers and their business models, especially the growth of massive fully-OA publishers.

The benefit of this report is that it takes a slightly larger view of the entire scholarly communications ecosystem. The Nelson memo applied to both publications and data, and this report poses some interesting research questions about open data, like how it should be shared, and what is it going to cost? It also presents questions about preprint servers and peer review, two issues not covered by OSTP.

Hexagonal Open AI logo black and white

The New York Times v. OpenAI & Microsoft

Over the holiday break, the New York Times sued OpenAI and Microsoft for copyright infringement. The lawsuit covers both using New York Times content for training, for reproducing the content in response to prompts. 

The New York Times may not be “scholarly,” but the suit could be a preview of how large scholarly publishers deal with OpenAI. First, it is fair to call both the Times and scholarly journals high quality content, the kind that OpenAI likely prefers for training its model (Complaint, p. 29). Second, there are unauthorized copies of much of the content online, so it would be possible to initially train a model on the content without permission. Finally, there is the financial angle. This lawsuit comes after negotiations between the companies to have them pay for the New York Times’ content. While some publishers are exploring ways to use AI with their own content, they may find it profitable to license that content to OpenAI and other companies.

One other interesting note here is how Microsoft is brought into the lawsuit from several different angles. First, it is a big investor in OpenAI. Second, it offers products based on OpenAI’s models, in particular anything branded “Copilot,” and Bing Chat. It is also being accused of helping OpenAI make copies of content in training ChatGPT, or at least overlooking the copying OpenAI was allegedly doing. But the most interesting claim that could have far reaching implications if a court agrees is that Microsoft is committing copyright infringement by “storing, processing, and reproducing” the models on its platform. (Complaint, p. 60). That being copyright infringement could greatly chill AI research, as a researcher would need to know the provenance of a model, and every document used in its training, to be safe from a copyright claim.

Given that this lawsuit is following negotiations over a license agreement, it would not be surprising if this settles before trial. The New York Times may be well-resourced for a big legal fight, but there are no guarantees they would win, risking a lot of licensing revenue. At some point there will be a copyright suit regarding AI that goes to trial (no guess as to which, as it can take a long time to go from filing a case to a trial), but maybe not this one.

Parthanon clip art set left next to title: Internet Archive, forming their logo.

Internet Archive v. Hachette: CDL case update

There has finally been some movement in the Internet Archive’s appeal of the Southern District of New York District Court summary judgment decision that it lost back in March. 

If you do not recall, the case turned on the question of whether or not controlled digital lending, or at least the Internet Archive’s implementation of it, was fair use. Of the four fair use factors, the court determined that none favored the use as fair. One of the key points of appeal is the District Court’s finding that the Internet Archive’s CDL program was a commercial use. Another important issue was if the use was transformative. The final major point the Internet Archive lost was on economic harm to the publishers. On December 15, the Internet Archive filed its brief addressing these issues.

On the one hand, many of the arguments IA makes are not new; it is just asserting the trial court got them wrong. It claims that the trial court judge “failed to grasp the key feature of controlled digital lending: the digital copy is available only to the one person entitled to borrow it at a time.” (Brief, p. 16). Overall, the Internet Archive sticks to its argument that CDL is non-commercial, transformative, and has no effect on the potential market.

The District Court’s finding that the Internet Archive’s CDL program was commercial could have the most important ramifications going forward. The Wikimedia Foundation and others filed an amicus brief focusing on that issue. Part of that finding was that IA had a “Donate” button on the pages of the digital books it lent. However, it had buttons to donate on every page of the site, much like one sees on Wikipedia. If donation buttons render a page commercial, then non-profit organizations will never be able to have a non-commercial page on the internet. 

One other new, or at least more focused argument, is that the market that the District Court identified, the market for ebook licenses for libraries, is the wrong market to consider, but no matter the market, there is no harm. No matter which way this point goes, it would be good to get some more guidance from the court as to who has the burden of proof on this point, and how it should be proven. 

Internet Archive does also ask that the court analyze the National Emergency Library, where it lent digital books regardless of the number of print copies it had, and the Open Libraries project, where it digitized and lent books of other libraries, separately. Presumably, this only really matters if regular controlled digital lending is found to be a fair use, but one cannot be sure without seeing the nature of the damages agreed to by the parties, which is confidential. 

One thing I noted was that the Internet Archive does not address the recent Supreme Court opinion in Andy Warhol Foundation for Visual Arts, Inc. v. Goldsmith in terms of evaluating transformative use. I had thought that the District Court would wait on this opinion. Given what the opinion said about transformative use, I expect to see it more heavily relied on in the Publishers’ brief, due March 15.

The Boston Library Consortium, of which Boston College is a member, joined an amicus brief in support of the Internet Archive.

Fall Theses and Dissertations Workshops

Writing a thesis or dissertation takes a lot of work, and the end result is a great academic accomplishment. But even when it is written and defended, there is one last task to be truly finished – the less glamorous but important step of submitting a copy to the library. The library will make sure it is properly preserved as an academic record and will make it available to the world for free, after an embargo period if need be.

To get this process started, we have eTD@BC workshops for graduate students preparing to submit electronic theses and dissertations. This fall, there will be three sessions, one in-person and two virtual, all covering the same material.

Dates:
Tuesday, October 10, noon – 12:45 pm, on Zoom.
Monday, October 16, noon – 12:45 pm, O’Neill Library 307.
Tuesday, October 17, 6:30 – 7:15 pm, on Zoom.

To register, go to https://libcal.bc.edu/calendar/workshops. Upon registration for an online workshop, you will receive a confirmation email with the Zoom link.

Topics to be covered in this workshop include:

  • The submission website, including a walk-through of the submission process
  • Important decisions and issues, such as eScholarship@BC, embargoes, copyright, etc.
  • How to ensure that a published eTD can be discovered and accessed by others
  • Where to get additional help

Graduate students can contact etd-support@bc.edu with any questions about the workshops. There will be additional workshops in the spring.

OA Week this October

Open Access Week is October 23 -29th. Details are still being worked out, but expect to see library displays, blog content, and an on-campus event that week. Keep an eye out for announcements!

International Open Access Week is organized by SPARC and an Open Access Week Advisory Committee. For this year, they are focusing on the theme “Community over Commercialization.” The theme casts a light on some of the tensions between publishing using commercial publishers, which benefits specific private businesses, and publishing Open Access to benefit the public interest.

An image promoting Open Access week, from October twenty-third to twenty-ninth, with the hashtag #OAWeek

Image credit: https://www.openaccessweek.org/attributions