Critics have been raising concerns about companies' use of information posted online to train their large language models for generative AI use. Recently, a proposed class action lawsuit was filed against OpenAI, accusing it of scraping "massive amounts of personal data from the internet," including "stolen private information," to train its GPT models without prior consent. As Search Engine Journal notes, we'll likely see plenty of similar lawsuits in the future as more companies develop their own generative AI products.
Owners of websites that could be considered public squares in the digital age have also taken steps to either prevent or profit from the generative AI boom. Reddit has started charging for access to its API, leading third-party clients to shut down over the weekend. Meanwhile, Twitter put a restriction on how many tweets a user sees per day to "address extreme levels of data scraping [and] system manipulation."