The Second Circuit Court of Appeals has ruled against the Internet Archive (IA) and its Controlled Digital Lending program in Hachette Book Group, Inc. v. Internet Archive, holding that the program was not fair use, with every fair use factor supporting Hachette et al. (the Publishers). An appeal is possible, but the future of doing CDL at scale under the fair use doctrine is bleak.
How We Got Here
Controlled Digital Lending (CDL) is the digitization by libraries of lawfully acquired books, and the lending of those copies via technical measures that prevent copying digital files, while ensuring that there are never more total copies (physical and digital combined) in circulation than the number of physical copies owned.
This case started back in 2020. Internet Archive had a CDL program for years, with their own books and with partners. During the COVID emergency, in response to library closures, they started a “National Emergency Library” where they removed the cap on the number of digital copies of books they circulated. At this point, a number of large publishers sued IA for copyright infringement.
The Internet Archive could not deny they made copies of books, but claimed CDL was allowed under fair use. The district court that heard the case ruled against IA on every major point. Internet Archive appealed to the Second Circuit Court of Appeals, and the case was heard in late June. Importantly, between the District Court opinion and the Appeals Court hearing, the Supreme Court decided Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith, interpreting fair use in a way that put much weight on the ability for a copy to substitute for an original work as being against fair use.
Second Circuit Ruling
The three-judge Appeals Court ruled against the Internet Archive on all four fair use factors.
- Purpose and Character of Use
This section examines two important questions. One, was the use of the copy transformative? And two, was the use commercial?
The issue of transformative use really decided the case. Internet Archive pointed to cases that said a use might be transformative if it improved efficiency in delivering the content, as CDL does. CDL also allows people to link directly to the book as a source of information. However, the Court said that the use was “meant to–and does–substitute for the original Works.” (p.24). This substitution is the antithesis of transformativeness. The Court then tried to articulate the difference between a transformative work and a derivative work. A derivative work is defined by statute as “…work based upon one or more preexisting works…” with many examples and the catch-all “any other form in which a work may be recast, transformed, or adapted.” E-books are not listed in the definition, but the Court stated that changing the medium of a work is by itself a derivative use. (p. 25). The Court did have to distinguish a couple of other cases, most notably the Sony copyright case from the 1980’s, which allowed people to use VCRs (technically Betamax) to record television programing to watch it later, in such a way cabins the opinion to a particular time period in broadcasting and technology that raises questions as to whether the court would rule the same way today.
On a side note, in considering the owned-to-loaded ratio of CDL as a basis for the use being transformative, the court said “IA does not perform the traditional functions of a library; it prepares derivatives of Publishers’ Works and delivers those derivatives to its users in full.” (p. 31). I do not think this was a significant part of the Court’s analysis, but it does point to an issue that was more present in the initial stages of the case – is the Internet Archive a library? I hope courts take that question beyond “lending of print books” if called upon to make that specific choice in the future.
The use was found by the Second Circuit to be non-commercial, with the program only providing attenuated financial benefit to IA, but that did not swing this factor in favor of the Internet Archive.
- Nature of the Work
While Internet Archive argued that the copying of non-fiction books should at least be neutral in the balance of fair use, as facts are not protected by copyright, the Court still found that overall, non-fiction books have the type of creativity that copyright protects. (p. 41). This is one spot where a specific book-by-book analysis could have made a difference, instead of looking at the aggregate, but this is typically not a decisive factor and probably saved months of time in reaching an opinion that would have come out much the same.
- Amount and Substantiality of Portion Used
This factor is tricky in that courts need to emphasize that there is no strict quantitative rule that a certain amount of copying is always fair use, or that copying an entire work is never fair use. The Court here pointed out that IA copied entire works, and also made those works available to the public in their entirety. (p.42). The Court looked back at the first factor here to say that since the copying was not for a transformative purpose, but to substitute for the work, the copying was too much. This is not usually a decisive factor, and the court here seemed most interested in highlighting why this copying was different than in its Google books case, where Google copied entire books, but only made snippets available to the public.
- Effect of Use upon Potential Market for Work
The court held that the Publishers did not have to prove harm with evidence, but that the Internet Archive had to meet a near impossible standard – proving that CDL does not harm the market for books. (p.45). The IA was not helped by the fact that most of the Publishers did not provide monthly sales data. Both IA and the Publishers had expert witnesses, but the Publishers’ expert carried the day in critiquing IA, as the Court was not persuaded by the analyses from IA looking at book sales during and after the National Emergency Library. (pp. 49-52). The court put stock in the idea that if a copy is available for free, the market for the paid version will be affected negatively, and also noted a couple of phrases from IA’s promotional material strongly suggest it was appealing to other libraries to utilize IA’s CDL program so as not to have to pay for ebooks. It is quite possible the Publishers could have found examples of libraries doing exactly that – libraries deciding not to buy eBooks and instead putting IA’s Open Library books into their catalog for patrons – and used that as evidence. However, the Publishers may have been more worried about opening the door to the idea that the copyright holder should have to provide evidence of market harm, which is a standard they would really want to avoid.
The Court did nominally consider the benefit the public derived from IA’s CDL to counterbalance to market harm. Here it really discounted the value of providing access to knowledge to the public, and focused on the argument that providing copies of books would disincentivize authors, and disincentivized authors would not create new works, harming the public. This Court cited the Supreme Court on this exact point. Perhaps if the Supreme Court was able to monetize its opinions instead of them being in the public domain, they would take more cases!
The Future of CDL
The most immediate question is what this means for CDL of books that are not available electronically from a publisher. On the one hand, the court says that “[W]e conclude that the challenged practices–IA’s lending of its “own” digital books that are commercially available for sale or license in any electronic text format,’ . . . are not fair use.” (p.20). On the other hand, it defines the market as the market for “the Works in general, without regard to format.” (p. 46) and makes it clear it thinks that creating a digital copy of a book was making a derivative of the original, and not a transformative use. Put together, it is hard to see how any book could be digitized and used for CDL without permission.
That permission may be the key to future CDL. There is nothing saying that a publisher could not allow libraries to copy and lend books with a one-to-one owned-to-loaned ratio, or any ratio. It might not appeal to Hachette, but there could be publishers who do not have the bandwidth to have their own digitization program, nor want to work with platforms like OverDrive, and choose to have library-based CDL, perhaps with some money changing hands.
As for IA’s program, IA is “reviewing the court’s opinion.” They could appeal to have the case re-heard by the entire Second Circuit, though this is rare. Longer term, IA does have an option to appeal to the Supreme Court, but the Supreme Court does not have to hear it. There is a hint of a circuit split on the issue of proving a lack of market harm, as the D.C. Circuit ruled in a different direction in the similar case ASTM vs. PRO. But, the Second Circuit here relied heavily on the very recent Warhol case, which the Supreme Court decided in 2023. I think that makes a grant of cert unlikely. Even if the Supreme Court would be interested in addressing the market harm issue, Internet Archive will have to consider if that would be enough to change the final result. Proponents of CDL may need to focus on legislative changes to copyright law, especially to libraries’ copyright exemptions in 17 USC 108, to deliver on the benefits of CDL.
Silver lining for non-profits, and maybe Creative Commons
As mentioned above, there was some good news from the decision about non-commercial use. At the district court level, Internet Archive’s CDL program was deemed a commercial use, as IA had a donate button on the same page as the book. There was also language about IA gaining reputational benefits from the program. This was all despite IA being a non-profit organization. The Second Circuit found that this standard would be very damaging to non-profits in general, and would likely prevent them from ever utilizing fair use. (pp. 37-38). Though not mentioned in the case, this language might also provide a little more clarity around the Creative Commons BY-NC license, where commercial is not defined.
What does this case mean for AI?
This case will also interest people looking at the New York Times case against OpenAI and Microsoft, currently in a lower court within the Second Circuit. Many AI companies claim that scraping the internet and using copyrighted works in training a model is fair use, and cite the Google Books case. I would not say this case is likely to swing the result of AI training cases, but two bits stand out.
First, the court positively cites language from several cases that says a copy is not transformative if it just “repackages” or “republishes” a work. Query whether training a Large Language Model is just repackaging content into some set of probabilities and relationships between words. Second, in discussing the third fair use factor, it reiterates its decision in another case to limit the factor to the amount of the material “made available to the public.” (p. 42). Assuming the public cannot extract copyrighted content back out of a model (which is a point of debate, and might depend on the content), it seems like the AI companies will do well on that factor.
Apologizes for citation style to any Bluebook devotees!