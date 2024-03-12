It's getting hard to keep up with copyright lawsuits against generative AI, with a new proposed class action hitting the courts last week. This time, authors are suing NVIDIA over its AI platform NeMo, a language model that allows businesses to create and train their own chatbots, Ars Technica reported. They claim the company trained it on a controversial dataset that illegally used their books without consent.

Authors Abdi Nazemian, Brian Keene and Stewart O’Nan demanded a jury trial and asked Nvidia to pay damages and destroy all copies of the Books3 dataset used to power NeMo large language models (LLMs). They claim that dataset copied a shadow library called Bibliotek consisting of 196,640 pirated books.

"In sum, NVIDIA has admitted training its NeMo Megatron models on a copy of The Pile dataset," the claim states. "Therefore, NVIDIA necessarily also trained its NeMo Megatron models on a copy of Books3, because Books3 is part of The Pile. Certain books written by Plaintiffs are part of Books3— including the Infringed Works—and thus NVIDIA necessarily trained its NeMo Megatron models on one or more copies of the Infringed Works, thereby directly infringing the copyrights of the Plaintiffs.

In response, NVIDIA told The Wall Street Journal that "we respect the rights of all content creators and believe we created NeMo in full compliance with copyright law."

Last year, OpenAI and Microsoft were hit with a copyright lawsuit from nonfiction authors, claiming the companies made money off their works but refused to pay them. A similar lawsuit was launched earlier this year. That's on top of a lawsuit from news organizations like The Intercept and Raw Story, and of course, the legal action that kicked all of this off from The New York Times.