Navigate Select ESC Close

The LLM's RL Revelation We Didn't See Coming

2025-06-24 Science & Technology
142.3k
5.2k
364
bycloud
bycloud
225.0k subscribers

Unlock all features

FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.

Description

Try out Warp 2.0 now, the current rank #1 AI on Terminal Bench, outperforming Claude Code: https://go.warp.dev/bycloud You can also use code "BYCLOUD" to get Warp Pro for 1 month free. (limited for 1,000 redemptions) My Newsletter https://mail.bycloud.ai/ my project: find, discover & explain AI research semantically https://findmypapers.ai/ My Patreon (get bundle access for my newsletter & findmypapers) https://www.patreon.com/c/bycloud Training language models to follow instructions with human feedback [Paper] https://arxiv.org/abs/2203.02155 DeepSeek-R1 (Aha Moment) [Paper] https://arxiv.org/abs/2501.12948 Understanding R1-Zero-Like Training: A Critical Perspective [Paper] https://arxiv.org/pdf/2503.20783 Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? [Paper] https://arxiv.org/abs/2504.13837 Reinforcement Learning Finetunes Small Subnetworks in Large Language Models [Paper] https://arxiv.org/abs/2505.11711 Spurious Rewards: Rethinking Training Signals in RLVR [Paper] https://arxiv.org/abs/2506.10947 Try out my new fav place to learn how to code https://scrimba.com/?via=bycloudAI This video is supported by the kind Patrons & YouTube Members: 🙏Nous Research, Chris LeDoux, Ben Shaener, DX Research Group, Poof N' Inu, Andrew Lescelius, Deagan, Robert Zawiasa, Ryszard Warzocha, Tobe2d, Louis Muk, Akkusativ, Kevin Tai, Mark Buckler, NO U, Tony Jimenez, Ângelo Fonseca, jiye, Anushka, Asad Dhamani, Binnie Yiu, Calvin Yan, Clayton Ford, Diego Silva, Etrotta, Gonzalo Fidalgo, Handenon, Hector, Jake Disco very, Michael Brenner, Nilly K, OlegWock, Daddy Wen, Shuhong Chen, Sid_Cipher, Stefan Lorenz, Sup, tantan assawade, Thipok Tham, Thomas Di Martino, Thomas Lin, Richárd Nagyfi, Paperboy, mika, Leo, Berhane-Meskel, Kadhai Pesalam, mayssam, Bill Mangrum, nyaa, Toru Mon [Discord] https://discord.gg/NhJZGtH [Twitter] https://twitter.com/bycloudai [Patreon] https://www.patreon.com/bycloud [Business Inquiries] [email protected] [Profile & Banner Art] https://twitter.com/pygm7 [Video Editor] @Booga04 [Ko-fi] https://ko-fi.com/bycloudai

Top Comments (10)

@lupusalbus3795 2025-06-24

I thought this was about an LLM written in Brainfuck. Still neat, but not what I was hoping for

696 11 replies
@Jo-kes7co 2025-06-24

Deep Mind called this since the start. If you train a model with human data it hits a ceiling. RL just refines to get it closer to the ceiling.

268 31 replies
@DefaultFlame 2025-06-24

Honestly, it sounds like RLVR is still very useful, but not in the role it was intended. A supercharger, not a gearbox or steering wheel.

132 1 replies
@ThioJoe 2025-06-25

My intuition says that we’re missing something huge with LLMs. The sheer volume of information doesn’t seem to correspond fit with how stupid they can be sometimes.

98 13 replies
@scapegoatoftheuniverse7302 2025-06-25

5:31 ad ends

43 2 replies
@bycloudAI 2025-06-24

Try out Warp 2.0 now, the current rank #1 AI on Terminal Bench, outperforming Claude Code: https://go.warp.dev/bycloud You can also use code "BYCLOUD" to get Warp Pro for 1 month free. (limited for 1,000 redemptions)

18 3 replies
@iloveblender8999 2025-06-26

Researchers have correctly identified that supervised learning is limited by the data, so some of them want unsupervised learning to work, but they have not figured out how it could succeed. This is one of the key areas to achieve AGI.

12 2 replies
@io9021 2025-06-25

Thanks for this video, I wasn't aware of this paper. This channel is becoming really useful for staying up to date on ML news.

6
@MaxMorfiX 2025-06-28

fireship aah thumbnail

3
@MrNabows 2025-06-26

I've really been baited, I swore this was a fireship video.

3

Unlock the Data Inside
Turn Videos into Knowledge

  • Get FREE 10/day: transcripts, summaries, chats
  • Chat with videos, export text & PDF
  • $1 free API credit for RAG, chatbots & research

Free forever plan • All features unlocked

App screenshot