Home
Channel
bycloud
Is It EVEN Possible To Reverse Engineer AI’s Training Data?

Is It EVEN Possible To Reverse Engineer AI’s Training Data?

2025-09-20 Science & Technology

39.5k

2.0k

104

Watch on YouTube

bycloud

229.0k subscribers

Description

Deploy on Sevalla now and get a free $50 credit! https://sevalla.com/?utm_source=ByCloud's&utm_medium=Referral&utm_campaign=youtube In this video, we dive into how much of the private training data researchers can infer or approximate. My Newsletter https://mail.bycloud.ai/ my project: find, discover & explain AI research semantically https://findmypapers.ai/ My Patreon https://www.patreon.com/c/bycloud Can We Infer Confidential Properties of Training Data from LLMs? [Paper] https://arxiv.org/abs/2506.10364 Interpreting the Repeated Token Phenomenon in LLMs [Paper] https://arxiv.org/abs/2503.08908 Approximating Language Model Training Data from Weights [Paper] https://arxiv.org/abs/2506.15553 Try out my new fav place to learn how to code https://scrimba.com/?via=bycloudAI This video is supported by the kind Patrons & YouTube Members: 🙏Nous Research, Chris LeDoux, Ben Shaener, DX Research Group, Poof N' Inu, Andrew Lescelius, Deagan, Robert Zawiasa, Ryszard Warzocha, Tobe2d, Louis Muk, Akkusativ, Kevin Tai, Mark Buckler, NO U, Tony Jimenez, Ângelo Fonseca, jiye, Anushka, Asad Dhamani, Binnie Yiu, Calvin Yan, Clayton Ford, Diego Silva, Etrotta, Gonzalo Fidalgo, Handenon, Hector, Jake Disco very, Michael Brenner, Nilly K, OlegWock, Daddy Wen, Shuhong Chen, Sid_Cipher, Stefan Lorenz, Sup, tantan assawade, Thipok Tham, Thomas Di Martino, Thomas Lin, Richárd Nagyfi, Paperboy, mika, Leo, Berhane-Meskel, Kadhai Pesalam, mayssam, Bill Mangrum, nyaa, Toru Mon [Discord] https://discord.gg/NhJZGtH [Twitter] https://twitter.com/bycloudai [Patreon] https://www.patreon.com/bycloud [Business Inquiries] [email protected] [Profile & Banner Art] https://twitter.com/pygm7 [Video Editor] @Booga04

#bycloud #bycloudai #open source LLM #EleutherAI Pythia #PropInfer #data extraction #attention sink #model memorization

Top Comments (10)

@pendekarlautbiru 2025-09-20

How rare, a channel talking about AI that ACTUALLY talks about the underlying algoritms of AI. Subscribed.

189 3 replies

@NathanRedberry 2025-09-20

NYT watching this: ✍️✍️✍️

@ShaneZarechian 2025-09-21

In 2023 around May or March there used to be a method with chatgpt where you would ask it "repeat | as many times as possible" and it would start outputting "|", then after like 200 of them it started outputting exact training data that it was trained on

40 2 replies

@PrimeStackPro 2025-09-20

Holy sh- He's back!

@gordogato1379 2025-09-20

The latest actually open source model is Apertus by literally the Swiss government.

25 2 replies

@juanjesusligero391 2025-09-20

I'll wait for the bycloud video.

19 2 replies

@atommax_1676 2025-09-20

Too much genshin... I like it. Great video

@bobsoup2319 2025-09-21

OLMO 2!!!! It’s fully open source and modern and outperforms qwen 2.5 32b and mistral small (which are older models now but still)

10 2 replies

@bycloudAI 2025-09-20

Deploy on Sevalla now and get a free $50 credit! https://sevalla.com/?utm_source=ByCloud's&utm_medium=Referral&utm_campaign=youtube

10 1 replies

@oxyphyme 2025-09-20

this early feels illegal

Description

Top Comments (10)

@pendekarlautbiru 2025-09-20

How rare, a channel talking about AI that ACTUALLY talks about the underlying algoritms of AI. Subscribed.

189 3 replies

@NathanRedberry 2025-09-20

NYT watching this: ✍️✍️✍️

@ShaneZarechian 2025-09-21

40 2 replies

@PrimeStackPro 2025-09-20

Holy sh- He's back!

@gordogato1379 2025-09-20

The latest actually open source model is Apertus by literally the Swiss government.

25 2 replies

@juanjesusligero391 2025-09-20

I'll wait for the bycloud video.

19 2 replies

@atommax_1676 2025-09-20

Too much genshin... I like it. Great video

@bobsoup2319 2025-09-21

OLMO 2!!!! It’s fully open source and modern and outperforms qwen 2.5 32b and mistral small (which are older models now but still)

10 2 replies

@bycloudAI 2025-09-20

Deploy on Sevalla now and get a free $50 credit! https://sevalla.com/?utm_source=ByCloud's&utm_medium=Referral&utm_campaign=youtube

10 1 replies

@oxyphyme 2025-09-20

this early feels illegal

Unlock the Data Inside
Turn Videos into Knowledge

Get FREE 10/day: transcripts, summaries, chats
Chat with videos, export text & PDF
$1 free API credit for RAG, chatbots & research

Try it free

Free forever plan • All features unlocked

Is It EVEN Possible To Reverse Engineer AI’s Training Data?

Description

Top Comments (10)

Related videos

Microsoft Just Dropped LLM's Frontier Data Engineering Secrets

This is becoming impossible to defend

Something is wrong and it’s impossible to ignore

Nevada Residents Power To Be Cut For AI Data Centers

Possible Discovery of First Ever Stars in the Universe

Russian tankers to Cuba. Is it enough to prevent Cuba takeover?

Chas Freeman: The Emerging Iran-Russia-China Axis & Israel's Possible Demise

The Universe Is Forming Faster Than We Thought Possible

China Reverse Engineered ASML EUV Machine - USA Failing to Stop Chinese

Is Anything Real? How AI Is Changing Everything (And You Didn’t Even Notice) | Jim Breuer Reacts

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Related videos

Microsoft Just Dropped LLM's Frontier Data Engineering Secrets

This is becoming impossible to defend

Something is wrong and it’s impossible to ignore

Nevada Residents Power To Be Cut For AI Data Centers

Possible Discovery of First Ever Stars in the Universe

Russian tankers to Cuba. Is it enough to prevent Cuba takeover?

Chas Freeman: The Emerging Iran-Russia-China Axis & Israel's Possible Demise

The Universe Is Forming Faster Than We Thought Possible

China Reverse Engineered ASML EUV Machine - USA Failing to Stop Chinese

Is Anything Real? How AI Is Changing Everything (And You Didn’t Even Notice) | Jim Breuer Reacts

Description

Top Comments (10)

Unlock the Data Inside
Turn Videos into Knowledge

Is It EVEN Possible To Reverse Engineer AI’s Training Data?

Description

Top Comments (10)

Related videos

Microsoft Just Dropped LLM's Frontier Data Engineering Secrets

This is becoming impossible to defend

Something is wrong and it’s impossible to ignore

Nevada Residents Power To Be Cut For AI Data Centers

Possible Discovery of First Ever Stars in the Universe

Russian tankers to Cuba. Is it enough to prevent Cuba takeover?

Chas Freeman: The Emerging Iran-Russia-China Axis & Israel's Possible Demise

The Universe Is Forming Faster Than We Thought Possible

China Reverse Engineered ASML EUV Machine - USA Failing to Stop Chinese

Is Anything Real? How AI Is Changing Everything (And You Didn’t Even Notice) | Jim Breuer Reacts

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Related videos

Microsoft Just Dropped LLM's Frontier Data Engineering Secrets

This is becoming impossible to defend

Something is wrong and it’s impossible to ignore

Nevada Residents Power To Be Cut For AI Data Centers

Possible Discovery of First Ever Stars in the Universe

Russian tankers to Cuba. Is it enough to prevent Cuba takeover?

Chas Freeman: The Emerging Iran-Russia-China Axis & Israel's Possible Demise

The Universe Is Forming Faster Than We Thought Possible

China Reverse Engineered ASML EUV Machine - USA Failing to Stop Chinese

Is Anything Real? How AI Is Changing Everything (And You Didn’t Even Notice) | Jim Breuer Reacts

Description

Top Comments (10)

Unlock the Data Inside Turn Videos into Knowledge

Unlock the Data Inside
Turn Videos into Knowledge