Author Archives: Larry E Hibbler

OA Policy Changes at the Bill & Melinda Gates Foundation

The Bill & Melinda Gates Foundation has recently announced a “refreshed” Open Access Policy, to start in 2025. There is a lot to unpack.

The headline change for publishers is that the Foundation will no longer pay Article Processing Charges (APCs) for its funded researchers to publish Open Access. However, they have not stepped back from their support of Open Access. Rather than paying for post-publication OA, they are requiring posting all manuscripts on a preprint server. Not just any preprint server – one approved by the Foundation, with “a sufficient level of scrutiny to submissions.” The works must be licensed as CC-BY 4.0, or something similar. Interestingly, authors also must assign the license to an Author Accepted Manuscript of the article if it is published later. Any data that is used in the manuscript must also be made immediately available.

The Foundation is working with F1000, a subsidiary of Taylor & Francis, to create a preprint platform named VeriXiv. The platform will do a series of “ethics and integrity checks,” looking for things like plagiarism and image manipulation, as well as author-related conflicts. One thing that it is not doing is peer review. An author can still publish the article in a journal as well, as long as that journal respected the OA requirements of the Foundation, and the author would have to pay any APC themselves.

The question is how will this affect the publishing ecosystem? The Foundation awards more than five billion dollars in grants per year, which is enough to create real change. On the one hand, authors could decide that traditional publishing is not worth the time and cost, which the Foundation’s policy strongly suggests, and just move to preprints. On the other hand, authors may still have other institutional incentives tied to publishing output and prestige. Will this just shift the cost of traditional publishing to authors, and indirectly to libraries and universities that support them? It might work out that this is a lever to reduce prestige-based incentives at institutions, or it might work out that authors with fewer resources fall a little further behind.

This may also just be a business fight between funders and publishers, with researchers caught in the middle. Publishing is a bundle of services, including ethics and plagiarism checks, peer review, distribution and preservation. Commercial publishers charge a lot for that bundle. Starting with posting a preprint and then layering on other services could be cheaper, especially if one thinks different research outputs need differing levels of service. This opens the door to new business models, like stand-alone peer review services, as contemplated by the Publish-Review-Curate model of publishing. We will see who steps in to fill those needs.

ORCID at BC

One of the underreported requirements of the 2022 Nelson Memo requiring federally-funded research to be published open access is the requirement that federally-funded researchers have a digital persistent identifier. Federal guidance says that such an identifier should be from an open platform, disambiguate authors, and allow a researcher to have a profile with their works included, all provided at no cost to the researcher.

One might think that type of service sounds almost too good to be true. But, that is one part of the Nelson memo where the infrastructure exists today. ORCID, through its ORCID iD, already meets the recommended standards. The ORCID service provides users with a unique 16-character identifier, along with a profile with a permanent URL where they can add information about employment, education, works published, and even grants received!

Note: ORCID stands for “Open Researcher and Contributor ID.” They prefer “iD” for the actual identifier authors get. There were no federal guidelines on proper capitalization.

How do I get one of these ORCID iDs?

There are two ways to get an ORCID iD. You could just go to ORCID.org and register for a new account. However, you can also do it directly through the Boston College’s Agora Portal link, ORCID at BC. This lets you tie your ORCID iD to your Boston College login and Eagle ID. This will let you log in using your BC credentials.

Then what?

There are a few things to do once you have an ORCID iD.

Make sure it is public!

Sometimes people sign up for an ORCID iD, knowing they need it to fill out a form or application, but do not actually make it public.

Link it to a couple of sources for publications

ORCID lets you populate your profile with information from other databases, including Scopus and MLA International Bibliography! It also lets you link to information from CrossRef, if your publication has a DOI.

Put the ORCID iD in a few different places

Putting your ORCID iD on a personal webpage, in a CV (especially one you do not update frequently), and even in an email signature is a great quick way to let others find your work.

Right now, Boston College’s ORCID adoption rate for faculty is over 35%. That is not bad, but it means there is a long way to go. For more information on ORCID, and for help on specific integrations, check out our ORCID guide.

The State of Scholarly Publishing

For folks interested in the current state of scholarly publishing, especially regarding Open Access, there are two recent reports that do a great job of summarizing publishing’s move toward OA.

In November, the White House Office of Science and Technology Policy (OSTP) released its “Report to the U.S. Congress on Financing Mechanisms for Open Access Publishing of Federally Funded Research.” This report, required by a 2023 appropriations Act, describes the different business models currently being used to comply with the requirement of public access within a year of publication (remembering that the U.S. government uses the term “public access” to denote free-to-read access, and not any of the other rights OA implies). It also provides top-level statistics about the rapid growth in OA publishing over the last ten years.

The most interesting takeaway is how difficult it is to estimate how much federally funded researchers paid to publish in the last few years. Even the U.S. government has very limited data. The best guess from OSTP was slightly more than $378 million in 2021, a 39% increase from 2016. The other highlight of the report is the Appendix, which describes the economic concepts related to publishing that can be used to analyze the system.

Also in November, a group of faculty and staff from the Massachusetts Institute of Technology released the report “Access to Science and Scholarship: Key Questions about the Future of Research Publishing.” Much like the OSTP report, it spends most of its time discussing the recent history of publishing, highlighting growth in both scholarly outputs and in spending. There is more detail here on specific publishers and their business models, especially the growth of massive fully-OA publishers.

The benefit of this report is that it takes a slightly larger view of the entire scholarly communications ecosystem. The Nelson memo applied to both publications and data, and this report poses some interesting research questions about open data, like how it should be shared, and what is it going to cost? It also presents questions about preprint servers and peer review, two issues not covered by OSTP.

The New York Times v. OpenAI & Microsoft

Over the holiday break, the New York Times sued OpenAI and Microsoft for copyright infringement. The lawsuit covers both using New York Times content for training, for reproducing the content in response to prompts.

The New York Times may not be “scholarly,” but the suit could be a preview of how large scholarly publishers deal with OpenAI. First, it is fair to call both the Times and scholarly journals high quality content, the kind that OpenAI likely prefers for training its model (Complaint, p. 29). Second, there are unauthorized copies of much of the content online, so it would be possible to initially train a model on the content without permission. Finally, there is the financial angle. This lawsuit comes after negotiations between the companies to have them pay for the New York Times’ content. While some publishers are exploring ways to use AI with their own content, they may find it profitable to license that content to OpenAI and other companies.

One other interesting note here is how Microsoft is brought into the lawsuit from several different angles. First, it is a big investor in OpenAI. Second, it offers products based on OpenAI’s models, in particular anything branded “Copilot,” and Bing Chat. It is also being accused of helping OpenAI make copies of content in training ChatGPT, or at least overlooking the copying OpenAI was allegedly doing. But the most interesting claim that could have far reaching implications if a court agrees is that Microsoft is committing copyright infringement by “storing, processing, and reproducing” the models on its platform. (Complaint, p. 60). That being copyright infringement could greatly chill AI research, as a researcher would need to know the provenance of a model, and every document used in its training, to be safe from a copyright claim.

Given that this lawsuit is following negotiations over a license agreement, it would not be surprising if this settles before trial. The New York Times may be well-resourced for a big legal fight, but there are no guarantees they would win, risking a lot of licensing revenue. At some point there will be a copyright suit regarding AI that goes to trial (no guess as to which, as it can take a long time to go from filing a case to a trial), but maybe not this one.

Internet Archive v. Hachette: CDL case update

There has finally been some movement in the Internet Archive’s appeal of the Southern District of New York District Court summary judgment decision that it lost back in March.

If you do not recall, the case turned on the question of whether or not controlled digital lending, or at least the Internet Archive’s implementation of it, was fair use. Of the four fair use factors, the court determined that none favored the use as fair. One of the key points of appeal is the District Court’s finding that the Internet Archive’s CDL program was a commercial use. Another important issue was if the use was transformative. The final major point the Internet Archive lost was on economic harm to the publishers. On December 15, the Internet Archive filed its brief addressing these issues.

On the one hand, many of the arguments IA makes are not new; it is just asserting the trial court got them wrong. It claims that the trial court judge “failed to grasp the key feature of controlled digital lending: the digital copy is available only to the one person entitled to borrow it at a time.” (Brief, p. 16). Overall, the Internet Archive sticks to its argument that CDL is non-commercial, transformative, and has no effect on the potential market.

The District Court’s finding that the Internet Archive’s CDL program was commercial could have the most important ramifications going forward. The Wikimedia Foundation and others filed an amicus brief focusing on that issue. Part of that finding was that IA had a “Donate” button on the pages of the digital books it lent. However, it had buttons to donate on every page of the site, much like one sees on Wikipedia. If donation buttons render a page commercial, then non-profit organizations will never be able to have a non-commercial page on the internet.

One other new, or at least more focused argument, is that the market that the District Court identified, the market for ebook licenses for libraries, is the wrong market to consider, but no matter the market, there is no harm. No matter which way this point goes, it would be good to get some more guidance from the court as to who has the burden of proof on this point, and how it should be proven.

Internet Archive does also ask that the court analyze the National Emergency Library, where it lent digital books regardless of the number of print copies it had, and the Open Libraries project, where it digitized and lent books of other libraries, separately. Presumably, this only really matters if regular controlled digital lending is found to be a fair use, but one cannot be sure without seeing the nature of the damages agreed to by the parties, which is confidential.

One thing I noted was that the Internet Archive does not address the recent Supreme Court opinion in Andy Warhol Foundation for Visual Arts, Inc. v. Goldsmith in terms of evaluating transformative use. I had thought that the District Court would wait on this opinion. Given what the opinion said about transformative use, I expect to see it more heavily relied on in the Publishers’ brief, due March 15.

The Boston Library Consortium, of which Boston College is a member, joined an amicus brief in support of the Internet Archive.