Navigate Select ESC Close

Is It EVEN Possible To Reverse Engineer AI’s Training Data?

2025-09-20 Science & Technology
39.5k
2.0k
104
bycloud
bycloud
225.0k subscribers

Unlock all features

FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.

Description

Deploy on Sevalla now and get a free $50 credit! https://sevalla.com/?utm_source=ByCloud's&utm_medium=Referral&utm_campaign=youtube In this video, we dive into how much of the private training data researchers can infer or approximate. My Newsletter https://mail.bycloud.ai/ my project: find, discover & explain AI research semantically https://findmypapers.ai/ My Patreon https://www.patreon.com/c/bycloud Can We Infer Confidential Properties of Training Data from LLMs? [Paper] https://arxiv.org/abs/2506.10364 Interpreting the Repeated Token Phenomenon in LLMs [Paper] https://arxiv.org/abs/2503.08908 Approximating Language Model Training Data from Weights [Paper] https://arxiv.org/abs/2506.15553 Try out my new fav place to learn how to code https://scrimba.com/?via=bycloudAI This video is supported by the kind Patrons & YouTube Members: 🙏Nous Research, Chris LeDoux, Ben Shaener, DX Research Group, Poof N' Inu, Andrew Lescelius, Deagan, Robert Zawiasa, Ryszard Warzocha, Tobe2d, Louis Muk, Akkusativ, Kevin Tai, Mark Buckler, NO U, Tony Jimenez, Ângelo Fonseca, jiye, Anushka, Asad Dhamani, Binnie Yiu, Calvin Yan, Clayton Ford, Diego Silva, Etrotta, Gonzalo Fidalgo, Handenon, Hector, Jake Disco very, Michael Brenner, Nilly K, OlegWock, Daddy Wen, Shuhong Chen, Sid_Cipher, Stefan Lorenz, Sup, tantan assawade, Thipok Tham, Thomas Di Martino, Thomas Lin, Richárd Nagyfi, Paperboy, mika, Leo, Berhane-Meskel, Kadhai Pesalam, mayssam, Bill Mangrum, nyaa, Toru Mon [Discord] https://discord.gg/NhJZGtH [Twitter] https://twitter.com/bycloudai [Patreon] https://www.patreon.com/bycloud [Business Inquiries] [email protected] [Profile & Banner Art] https://twitter.com/pygm7 [Video Editor] @Booga04

Top Comments (10)

@pendekarlautbiru 2025-09-20

How rare, a channel talking about AI that ACTUALLY talks about the underlying algoritms of AI. Subscribed.

187 3 replies
@NathanRedberry 2025-09-20

NYT watching this: ✍️✍️✍️

94
@PrimeStackPro 2025-09-20

Holy sh- He's back!

35
@gordogato1379 2025-09-20

The latest actually open source model is Apertus by literally the Swiss government.

25 2 replies
@burnytech 2025-09-20

OLMo is also fully open source, no?

18 1 replies
@atommax_1676 2025-09-20

Too much genshin... I like it. Great video

12
@bycloudAI 2025-09-20

Deploy on Sevalla now and get a free $50 credit! https://sevalla.com/?utm_source=ByCloud's&utm_medium=Referral&utm_campaign=youtube

10 1 replies
@oxyphyme 2025-09-20

this early feels illegal

8
@alimaydar-x8q 2025-09-21

facebook was literally pirating and not in the anthropic way but torrenting and using libgen.

4
@zenithparsec 2025-09-22

I've seen the leaking data issue in google translate when you use long sequences of repeated tokens. Was very fun to play with.

1

Unlock the Data Inside
Turn Videos into Knowledge

  • Get FREE 10/day: transcripts, summaries, chats
  • Chat with videos, export text & PDF
  • $1 free API credit for RAG, chatbots & research

Free forever plan • All features unlocked

App screenshot