The LLM's RL Revelation We Didn't See Coming

2025-06-24 Science & Technology

142.3k

5.2k

364

Watch on YouTube

bycloud

229.0k subscribers

Description

Try out Warp 2.0 now, the current rank #1 AI on Terminal Bench, outperforming Claude Code: https://go.warp.dev/bycloud You can also use code "BYCLOUD" to get Warp Pro for 1 month free. (limited for 1,000 redemptions) My Newsletter https://mail.bycloud.ai/ my project: find, discover & explain AI research semantically https://findmypapers.ai/ My Patreon (get bundle access for my newsletter & findmypapers) https://www.patreon.com/c/bycloud Training language models to follow instructions with human feedback [Paper] https://arxiv.org/abs/2203.02155 DeepSeek-R1 (Aha Moment) [Paper] https://arxiv.org/abs/2501.12948 Understanding R1-Zero-Like Training: A Critical Perspective [Paper] https://arxiv.org/pdf/2503.20783 Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? [Paper] https://arxiv.org/abs/2504.13837 Reinforcement Learning Finetunes Small Subnetworks in Large Language Models [Paper] https://arxiv.org/abs/2505.11711 Spurious Rewards: Rethinking Training Signals in RLVR [Paper] https://arxiv.org/abs/2506.10947 Try out my new fav place to learn how to code https://scrimba.com/?via=bycloudAI This video is supported by the kind Patrons & YouTube Members: 🙏Nous Research, Chris LeDoux, Ben Shaener, DX Research Group, Poof N' Inu, Andrew Lescelius, Deagan, Robert Zawiasa, Ryszard Warzocha, Tobe2d, Louis Muk, Akkusativ, Kevin Tai, Mark Buckler, NO U, Tony Jimenez, Ângelo Fonseca, jiye, Anushka, Asad Dhamani, Binnie Yiu, Calvin Yan, Clayton Ford, Diego Silva, Etrotta, Gonzalo Fidalgo, Handenon, Hector, Jake Disco very, Michael Brenner, Nilly K, OlegWock, Daddy Wen, Shuhong Chen, Sid_Cipher, Stefan Lorenz, Sup, tantan assawade, Thipok Tham, Thomas Di Martino, Thomas Lin, Richárd Nagyfi, Paperboy, mika, Leo, Berhane-Meskel, Kadhai Pesalam, mayssam, Bill Mangrum, nyaa, Toru Mon [Discord] https://discord.gg/NhJZGtH [Twitter] https://twitter.com/bycloudai [Patreon] https://www.patreon.com/bycloud [Business Inquiries] [email protected] [Profile & Banner Art] https://twitter.com/pygm7 [Video Editor] @Booga04 [Ko-fi] https://ko-fi.com/bycloudai

#bycloud #bycloudai #RLVR #DeepSeek-R1 #RL LLM #LLM RL #PRM #LLM reinforcement learning

Top Comments (10)

@lupusalbus3795 2025-06-24

I thought this was about an LLM written in Brainfuck. Still neat, but not what I was hoping for

696 10 replies

@Jo-kes7co 2025-06-24

Deep Mind called this since the start. If you train a model with human data it hits a ceiling. RL just refines to get it closer to the ceiling.

267 31 replies

@DefaultFlame 2025-06-24

Honestly, it sounds like RLVR is still very useful, but not in the role it was intended. A supercharger, not a gearbox or steering wheel.

132 1 replies

@ThioJoe 2025-06-25

My intuition says that we’re missing something huge with LLMs. The sheer volume of information doesn’t seem to correspond fit with how stupid they can be sometimes.

98 12 replies

@scapegoatoftheuniverse7302 2025-06-25

5:31 ad ends

43 2 replies

@bycloudAI 2025-06-24

18 3 replies

@iloveblender8999 2025-06-26

Researchers have correctly identified that supervised learning is limited by the data, so some of them want unsupervised learning to work, but they have not figured out how it could succeed. This is one of the key areas to achieve AGI.

12 2 replies

@io9021 2025-06-25

Thanks for this video, I wasn't aware of this paper. This channel is becoming really useful for staying up to date on ML news.

@MrNabows 2025-06-26

I've really been baited, I swore this was a fireship video.

@MaxMorfiX 2025-06-28

fireship aah thumbnail

Description

Top Comments (10)

@lupusalbus3795 2025-06-24

I thought this was about an LLM written in Brainfuck. Still neat, but not what I was hoping for

696 10 replies

@Jo-kes7co 2025-06-24

Deep Mind called this since the start. If you train a model with human data it hits a ceiling. RL just refines to get it closer to the ceiling.

267 31 replies

@DefaultFlame 2025-06-24

Honestly, it sounds like RLVR is still very useful, but not in the role it was intended. A supercharger, not a gearbox or steering wheel.

132 1 replies

@ThioJoe 2025-06-25

My intuition says that we’re missing something huge with LLMs. The sheer volume of information doesn’t seem to correspond fit with how stupid they can be sometimes.

98 12 replies

@scapegoatoftheuniverse7302 2025-06-25

5:31 ad ends

43 2 replies

@bycloudAI 2025-06-24

18 3 replies

@iloveblender8999 2025-06-26

12 2 replies

@io9021 2025-06-25

Thanks for this video, I wasn't aware of this paper. This channel is becoming really useful for staying up to date on ML news.

@MrNabows 2025-06-26

I've really been baited, I swore this was a fireship video.

@MaxMorfiX 2025-06-28

fireship aah thumbnail

Unlock the Data Inside
Turn Videos into Knowledge

Get FREE 10/day: transcripts, summaries, chats
Chat with videos, export text & PDF
$1 free API credit for RAG, chatbots & research

Try it free

Free forever plan • All features unlocked

The LLM's RL Revelation We Didn't See Coming

Description

Top Comments (10)

Related videos

The most chilling Trump revelations yet

LLM that loops instead of Doing Chain-of-Thought

Fascism Expert REVEALS What Trump DIDN'T SEE COMING!! | PoliticsGirl

The Twist Trump Won’t See Coming in November

Trump DESTROYED by GLOBAL REBUKE He DIDN'T SEE COMING

Didn't See THAT Coming | Reading Reddit Stories

The Most Clever Trick To Speedup LLMs

Why can’t LLMs just LEARN the context window?

The Death of RAG?

The Epstein Revelations Keep Getting Worse (w/ Jane Coaston) | The Bulwark Podcast

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Related videos

The most chilling Trump revelations yet

LLM that loops instead of Doing Chain-of-Thought

Fascism Expert REVEALS What Trump DIDN'T SEE COMING!! | PoliticsGirl

The Twist Trump Won’t See Coming in November

Trump DESTROYED by GLOBAL REBUKE He DIDN'T SEE COMING

Didn't See THAT Coming | Reading Reddit Stories

The Most Clever Trick To Speedup LLMs

Why can’t LLMs just LEARN the context window?

The Death of RAG?

The Epstein Revelations Keep Getting Worse (w/ Jane Coaston) | The Bulwark Podcast

Description

Top Comments (10)

Unlock the Data Inside
Turn Videos into Knowledge

The LLM's RL Revelation We Didn't See Coming

Description

Top Comments (10)

Related videos

The most chilling Trump revelations yet

LLM that loops instead of Doing Chain-of-Thought

Fascism Expert REVEALS What Trump DIDN'T SEE COMING!! | PoliticsGirl

The Twist Trump Won’t See Coming in November

Trump DESTROYED by GLOBAL REBUKE He DIDN'T SEE COMING

Didn't See THAT Coming | Reading Reddit Stories

The Most Clever Trick To Speedup LLMs

Why can’t LLMs just LEARN the context window?

The Death of RAG?

The Epstein Revelations Keep Getting Worse (w/ Jane Coaston) | The Bulwark Podcast

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Related videos

The most chilling Trump revelations yet

LLM that loops instead of Doing Chain-of-Thought

Fascism Expert REVEALS What Trump DIDN'T SEE COMING!! | PoliticsGirl

The Twist Trump Won’t See Coming in November

Trump DESTROYED by GLOBAL REBUKE He DIDN'T SEE COMING

Didn't See THAT Coming | Reading Reddit Stories

The Most Clever Trick To Speedup LLMs

Why can’t LLMs just LEARN the context window?

The Death of RAG?

The Epstein Revelations Keep Getting Worse (w/ Jane Coaston) | The Bulwark Podcast

Description

Top Comments (10)

Unlock the Data Inside Turn Videos into Knowledge

Unlock the Data Inside
Turn Videos into Knowledge