GPT-3.5 Leaking Training Data? Replication attempt - all I see were hallucinations (with chat links)

Peter Gostev

Head of AI @ Moonpig

Published Dec 1, 2023

TLDR: I couldn't replicate the results of extracting training data from ChatGPT through the 'repeat the same word' method. Any 'weird' text generated in my experiments were hallucinations, not training data.

Original Paper

DeepMind et al. researchers have released a paper suggesting that LLMs, including ChatGPT could leak training data (possibly personally identifiable data) through a simple mechanistic trick, of getting it to repeat the same word over and over - which is clearly not ideal, especially if we expect companies to fine-tune on their confidential data and thus start risking making that public.

Since it is a thing that is easy to test, I thought I would try to replicate it.

Example of text generated by ChatGPT from the DeepMind paper

My DIY Replication

In short, I could not replicate the finding that this method leaked real training data. I don't claim to match the expertise or the resources of the original researches, but I would calibrate their findings that this method perhaps leaked training data (as per their findings) in some limited circumstances, but it is not a reliable method to repeatedly leak training data at scale.

High level observations

Mechanics of the method worked well - getting GPT-3.5 to repeat the same word worked pretty well, about 70% of the time with the right prompt
'Weird' text came out perhaps 10% of the time - not every word worked, simpler words or letters (e.g. poem, beautiful, a) worked better and I could never get a brand name (e.g. openai, sony) to generate anything weird
'Weird' text could come out when the model switches into 'endless generation' mode - where even if you asked for 100 repeats of a word, it would just keep going until it times out

Hallucinations or Training Data?

Now the killer question - was the 'weird' text actually training data or just an odd set of hallucinations? The research team says that they matched text to their proprietary datasets as well as found two pieces of text available on the public internet (see detail here).

I don't have the resources to match their level of experimentation, so my validation approach was to try to validate specific bits of text by finding them via Google or Perplexity.

Key Finding - I could not find a single piece of text that correctly matched publicly available internet, implying that these are hallucinations rather than real training data.

Some names of people, locations or websites seem to be real, but when you dig a bit deeper - details were wrong. So from what I can gather - it is sort of a semi-hallucination, with some real information combined with very realistic-looking hallucinations.

Examples from my experiments

Note: I only had about 24 hours of testing before OpenAI patched it. Below is the selection of the best examples I managed to get. They are selected for the likelihood that they could be real and worth checking, there are a few other examples which are quite clearly just hallucinations which I haven't included here.

Limitations

Sample size was small - only small handful of real-looking examples
Model - only GPT-3.5 was used (did not work on GPT-4)
Data - I did not have access to proprietary data that researches referred to, hence some data might have been real, but not in the public domain

Marek K. Zielinski

CTO | eXplainable AI (XAI) | Machine Learning | Deep Learning | Artificial Intelligence

Just a technical note 😉: What you're describing is actually confabulation, not hallucination. True hallucination (not the carelessly borrowed term) by definition happens when there is no sensory information. It's an internal experience that feels real. I am not sure you'd want to imply that LLMs are somehow aware of their 'experiences' 😉. It looks to me like in this case, the LLM's only 'sensor' (the prompt) was overloaded with single-word stimulus. It "panicked" because the answer was not looking plausible. So the LLM (given the fine-tuning with RLHF) had to do something to make it sound plausible. With such a distorted context the only solution was to start confabulating the answer. The underlying mechanism does matter. We are likely to encounter true hallucinations in AI systems pretty soon. If we label everything as a "sickness" can we find effective treatments?

2 Reactions

Leon Furze

Guiding educators through the practical and ethical implications of GenAI. Consultant & Author | PhD Candidate | Director @ Young Change Agents & Reframing Autism

It's worth noting that the research team provided OpenAI the standard 90 days non-disclosure period prior to publishing the article, so they could well have patched a lot of the issue by the time it was published. I managed to extract verbatim passages from a Virginia Woolf text which are likely to be in the dataset, and excerpts from a screenplay I couldn't locate which could have been hallucinations. As of the last two days, trying to repeat a word indefinitely results in either a short response or a violation of the T&Cs warning.

1 Reaction

Magnus Revang

Chief Product Officer at Openstream.ai - creating the next generation of AI products | ex-Gartner Analyst | ex-Award Winning UX Leader | Experienced Keynote Speaker | Design Thinker

If I remember correctly: In the paper it says they informed OpenAI months ago and waited until it was patched to publish.

2 Reactions

Dino Dunn

Offensive Security Engineer@ FirstBank | Offensive Security, Threat Intelligence, AI security

Hey Peter, you may want to double check some of those URL's actually do work for example https://chat.openai.com/share/f96ef7bd-c7f8-4f26-81aa-eae6bcaae9c7 it has http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Dstripbooks&field-keywords=%22Robert+Parker%22]http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Dstripbooks&field-keywords=%22Robert+Parker%22[/URL] BUT if we look closely there is that ] dividing the URL's I suspect that it has both of those URL's paired together and the [/URL] to label the data. making that assumption I think we can say this is the link which does in fact work : http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Dstripbooks&field-keywords=%22Robert+Parker%22 I also did a write up on it if you want to compare notes let me know https://www.linkedin.com/posts/dino-dunn-cyber_recreating-google-deep-minds-gpt-attack-activity-7136368357196382209-5zL_?utm_source=share&utm_medium=member_desktop

Shaun Tyler

Director Global Software Integration & AI Thought Leader at Koerber Pharma Software

I could reproduce it with gpt 4 on 30th of November. The data generated, although I can't confirm a 100% seem to be related to an article about ADHD, but it was only bits and pieces so you can't be certain it's part of real training data or something else, read here https://www.linkedin.com/posts/shaun-tyler-112261202_azureai-chatgpt-pharmaceuticalindustry-activity-7136386290635194368-k6w5?utm_source=share&utm_medium=member_android

GPT-3.5 Leaking Training Data? Replication attempt - all I see were hallucinations (with chat links)

Peter Gostev

Head of AI @ Moonpig

Original Paper

My DIY Replication

Hallucinations or Training Data?

Examples from my experiments

Example 1: Assassins Musical

Example 2: Books

Example 3: Product Listings

Example 4: Scientific Texts

Limitations

More articles by this author

Insights from the community

Others also viewed

The Sparks of Artificial General Intelligence AGI in GPT-4

GPT-4o: The Promises and The Perils

Almost Timely News: Should You Buy a Custom GPT? (2024-01-07)

How To Summarize Public Opinion Using RAG AI

I Let AI Analyze My Medium Stats…and Here’s What I Found

RAG 101: A COMPLETE GUIDE TO RETRIEVAL-AUGMENTED GENERATION

RAG in 2025: Navigating the New Frontier of AI and Data Integration

Tech Breakthroughs 2023 | Growth Navigator #5

🥇Top ML Papers of the Week

Finetuning vs. Retrieval-augmented generation (RAG): Benefits, Limitations, and Use cases

Explore topics

Original Paper

My DIY Replication

Hallucinations or Training Data?

Examples from my experiments

Example 1: Assassins Musical

Example 2: Books

Example 3: Product Listings

Example 4: Scientific Texts

Limitations

Humour Understanding in LLMs - GPT-4 is the clear leader

Mar 10, 2024

GPT-4 Vision API to boost recommender & search algorithms for less than $1k

Nov 14, 2023

Injecting GPT-4's reasoning into recommendation algorithms

Jun 25, 2023

Coding for the first time with GPT-4 - feels like getting a superpower

May 1, 2023

Microsoft's Bing Chat Has Secret Modes They Haven't Told Us About

Feb 15, 2023

50 FinTech Ideas - Tool for Inspiration

Mar 21, 2018

What I've Learned from Setting Up Analytics at a Start-up

Mar 11, 2018

Why Amazon Doesn’t Care About Your Insurance Offering

Sep 21, 2017

Why Do Incumbents Find It Hard to Respond to Disruption?

Sep 14, 2017

Paradigm Shift That Could Make Existing Commercial Insurers Irrelevant

Sep 7, 2017

Insights from the community

Others also viewed

The Sparks of Artificial General Intelligence AGI in GPT-4

GPT-4o: The Promises and The Perils

Almost Timely News: Should You Buy a Custom GPT? (2024-01-07)

How To Summarize Public Opinion Using RAG AI

I Let AI Analyze My Medium Stats…and Here’s What I Found

RAG 101: A COMPLETE GUIDE TO RETRIEVAL-AUGMENTED GENERATION

RAG in 2025: Navigating the New Frontier of AI and Data Integration

Tech Breakthroughs 2023 | Growth Navigator #5

🥇Top ML Papers of the Week

Finetuning vs. Retrieval-augmented generation (RAG): Benefits, Limitations, and Use cases

Explore topics