What do AI chatbots know about us, and who are they sharing it with?

Use the programs with caution, like you would any other app.

Former Senior Reporter

Fri, Apr 7, 2023, 10:00 AM·4 min read

AI Chatbots are relatively old by tech standards, but the newest crop — led by OpenAI's ChatGPT and Google's Bard — are vastly more capable than their ancestors, not always for positive reasons. The recent explosion in AI development has already created concerns around misinformation, disinformation, plagiarism and machine-generated malware. What problems might generative AI pose for the privacy of the average internet user? The answer, according to experts, is largely a matter of how these bots are trained and how much we plan to interact with them

In order to replicate human-like interactions, AI chatbots are trained on mass amounts of data, a significant portion of which is derived from repositories like Common Crawl. As the name suggests, Common Crawl has amassed years and petabytes worth of data simply from crawling and scraping the open web. “These models are training on large data sets of publicly available data on the internet,” Megha Srivastava, PhD student at Stanford's computer science department and former AI resident with Microsoft Research, said. Even though ChatGPT and Bard use what they call a "filtered" portion of Common Crawl's data, the sheer size of the model makes it "impossible for anyone to kind of look through the data and sanitize it,” according to Srivastava.

Either through your own carelessness or the poor security practices by a third-party could be in some far-flung corner of the internet right now. Even though it might be difficult to access for the average user, it's possible that information was scraped into a training set, and could be regurgitated by that chatbot down the line. And a bot spitting out someone's actual contact information is in no way a theoretical concern. Bloomberg columnist Dave Lee posted on Twitter that, when someone asked ChatGPT to chat on encrypted messaging platform Signal, it provided his exact phone number. This sort of interaction is likely an edge case, but the information these learning models have access to is still worth considering. "It’s unlikely that OpenAI would want to collect specific information like healthcare data and attribute it to individuals in order to train its models," David Hoelzer, a fellow at security organization the SANS Institute, told Engadget. “But could it inadvertently be in there? Absolutely.”

Open AI, the company behind ChatGPT, did not respond when we asked what measures it takes to protect data privacy, or how it handles personally identifiable information that may be scraped into its training sets. So we did the next best thing and asked ChatGPT itself. It told us that it is "programmed to follow ethical and legal standards that protect users’ privacy and personal information" and that it doesn't "have access to personal information unless it is provided to me." Google for its part told Engadget it programmed similar guardrails into Bard to prevent the sharing of personally identifiable information during conversations.

Helpfully, ChatGPT brought up the second major vector by which generative AI might pose a privacy risk: usage of the software itself — either via information shared directly in chatlogs or device and user information captured by the service during use. OpenAI’s privacy policy cites several categories of standard information it collects on users, which could be identifiable, and upon starting it up, ChatGPT does caution that conversations may be reviewed by its AI trainers to improve systems.

Google's Bard, meanwhile, does not have a standalone privacy policy, instead uses the blanket privacy document shared by other Google products (and which happens to be tremendously broad.) Conversations with Bard don't have to be saved to the user's Google account, and users can delete the conversations via Google, the company told Engadget. “In order to build and sustain user trust, they're going to have to be very transparent around privacy policies and data protection procedures at the front end,” Rishi Jaitly, professor and distinguished humanities fellow at Virginia Tech, told Engadget.

Despite having a "clear conversations" action, pressing that does not actually delete your data, according to the service’s FAQ page, nor is OpenAI is able to delete specific prompts. While the company discourages users from sharing anything sensitive, seemingly the only way to remove personally identifying information provided to ChatGPT is to delete your account, which the company says will permanently remove all associated data.

Hoelzer told Engadget he’s not worried that ChatGPT is ingesting individual conversations in order to learn. But that conversation data is being stored somewhere, and so its security becomes a reasonable concern. Incidentally, ChatGPT was taken offline briefly in March because a programming error revealed information about users’ chat histories. It's unclear this early in their broad deployment if chat logs from these sorts of AI will become valuable targets for malicious actors.

For the foreseeable future, it's best to treat these sorts of chatbots with the same suspicion users should be treating any other tech product. “A user playing with these models should enter with expectation that any interaction they're having with the model," Srivastava told Engadget, "it's fair game for Open AI or any of these other companies to use for their benefit.”

This article contains affiliate links; if you click such a link and make a purchase, we may earn a commission.

Engadget
ISPs are fighting to raise the price of low-income broadband
Internet service providers are objected to the lower rates they need to offer lower income customers if they want to obtain government funds from a new Internet access program.
Engadget
Amazon is giving The Boys the prequel treatment
The cast and crew of Amazon's The Boys announced a bunch of new spinoffs for the supe action series.
Engadget
You can date everything in Date Everything!
Date Everything! is an upcoming dating sim game that lets you date evert
Engadget
The Bioshock movie is still happening but with a reduced budget
The Bioshock movie is still happening, but with steep budget cuts. It’s being reconfigured to become a ‘more personal’ film.
Engadget
Warner Bros. Discovery sues the NBA in a last-ditch effort to block Amazon’s new streaming package
Warner Bros. Discovery followed through on its threat to “take appropriate action” against the NBA for rejecting its broadcasting rights offer. On Friday, the media company sued the league after the NBA turned down its bid to match Amazon’s streaming package.
Engadget
Apple’s M3 MacBook Air with 16GB of RAM is $200 off right now
Apple’s M3 MacBook Air combines Apple’s lightest and thinnest laptop design with the cutting-edge horsepower of the latest Apple silicon chip. You can get the 2024 model on sale for $200 off right now.
Engadget
Here's how to stop Grok's AI models using your tweets for training
X automatically opted users into letting Grok's AI models train on their tweets and interactions with the chatbot. Here's how to opt out.
Engadget
The 10th-generation iPad is back down to $300, plus the rest of this week's best tech deals
The week after Amazon's Prime Day can be a bit sleepy for deals, but we still found a few decent discounts on gear we've tested and recommend.
Engadget
The 65-inch LG C3 OLED TV is nearly half off for today only
The 65-inch LG C3 OLED TV is nearly half off for today only. That brings the set down to a record low of $1,300.
Engadget
NASA's Perseverance rover found a rock on Mars that could indicate ancient life
A Martian rock sample collected by Perseverance contains "chemical signatures and structures" that could've been formed by ancient microbial life from billions of years ago.
Engadget
Apple agrees to stick by Biden administration's voluntary AI safeguards
Apple has joined more than a dozen other tech companies in signing up for the Biden administration's voluntary AI code of practice.
Engadget
North Korean who used ransomware to attack US healthcare providers has been indicted
A grand jury in Kansas City has indicted Rim Jong Hyok, a North Korean intelligence operative who allegedly used ransomware to attack health providers' systems in the US.
Engadget
Samsung Galaxy Ring review: A bit basic, a bit pricey
The Galaxy Ring is comfortable and seemingly basic, but actually delivers detailed insight on your sleep, walks and runs.
Engadget
Apple's 14-inch MacBook Pro laptop with an M3 Pro chip is $300 off at Amazon
Apple's well-specked 14-inch MacBook Pro with an M3 Pro chip, 18GB of memory and 512GB of storage is on sale for the lowest price we've seen yet at Amazon.
Engadget
Gran Turismo 7's more realistic physics update is launching cars into orbit
Gran Turismo 7's latest update is causing some bizarre problems, making cars bounce violently or launch completely into the air.
Engadget
The Morning After: OpenAI reveals its AI-powered search engine, SearchGPT
The biggest news stories this morning: AI video startup Runway reportedly trained on ‘thousands’ of YouTube videos without permission, The best cameras for 2024, WhatsApp hits 100 million monthly active US users.
Engadget
The best fitness trackers for 2024
Here's a list of the best fitness trackers you can buy, as chosen by Engadget editors.
Engadget
The best cameras for 2024
Here's a list of the best cameras you can buy, as chosen by Engadget editors.
Engadget
X's Grok chatbot is misleading voters about the presidential election
Grok's AI chatbot claims that President Biden's name must stay on the ballot in nine states, a claim that is categorically false.
Engadget
Comic-Con leak sparks rumors of two remastered Soul Reaver games
A photo from Comic-Con has leaked possible remasters of two Soul Reaver games from Crystal Dynamics.

What do AI chatbots know about us, and who are they sharing it with?

Use the programs with caution, like you would any other app.

Latest Stories

ISPs are fighting to raise the price of low-income broadband

Amazon is giving The Boys the prequel treatment

You can date everything in Date Everything!

The Bioshock movie is still happening but with a reduced budget

Warner Bros. Discovery sues the NBA in a last-ditch effort to block Amazon’s new streaming package

Apple’s M3 MacBook Air with 16GB of RAM is $200 off right now

Here's how to stop Grok's AI models using your tweets for training

The 10th-generation iPad is back down to $300, plus the rest of this week's best tech deals

The 65-inch LG C3 OLED TV is nearly half off for today only

NASA's Perseverance rover found a rock on Mars that could indicate ancient life

Apple agrees to stick by Biden administration's voluntary AI safeguards

North Korean who used ransomware to attack US healthcare providers has been indicted

Samsung Galaxy Ring review: A bit basic, a bit pricey

Apple's 14-inch MacBook Pro laptop with an M3 Pro chip is $300 off at Amazon

Gran Turismo 7's more realistic physics update is launching cars into orbit

The Morning After: OpenAI reveals its AI-powered search engine, SearchGPT

The best fitness trackers for 2024

The best cameras for 2024

X's Grok chatbot is misleading voters about the presidential election

Comic-Con leak sparks rumors of two remastered Soul Reaver games

About

Sections

Contribute

Buying Guides