The LLM's RL Revelation We Didn't See Coming
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Related videos
Fascism Expert REVEALS What Trump DIDN'T SEE COMING!! | PoliticsGirl
MeidasTouch
135.0k views
The Twist Trump Won’t See Coming in November
Philip DeFranco
362.5k views
Trump DESTROYED by GLOBAL REBUKE He DIDN'T SEE COMING
Legal AF
68.0k views
Didn't See THAT Coming | Reading Reddit Stories
Smosh Pit
1.1m views
The Most Clever Trick To Speedup LLMs
bycloud
17.7k views
Why can’t LLMs just LEARN the context window?
bycloud
30.9k views
The Death of RAG?
bycloud
15.0k views
The Epstein Revelations Keep Getting Worse (w/ Jane Coaston) | The Bulwark Podcast
The Bulwark
277.1k views
LLM’s Billion Dollar Problem
bycloud
44.6k views
The RL Irony in LLMs
bycloud
23.0k views
Top Comments (10)
I thought this was about an LLM written in Brainfuck. Still neat, but not what I was hoping for
Deep Mind called this since the start. If you train a model with human data it hits a ceiling. RL just refines to get it closer to the ceiling.
Honestly, it sounds like RLVR is still very useful, but not in the role it was intended. A supercharger, not a gearbox or steering wheel.
My intuition says that we’re missing something huge with LLMs. The sheer volume of information doesn’t seem to correspond fit with how stupid they can be sometimes.
5:31 ad ends
Try out Warp 2.0 now, the current rank #1 AI on Terminal Bench, outperforming Claude Code: https://go.warp.dev/bycloud You can also use code "BYCLOUD" to get Warp Pro for 1 month free. (limited for 1,000 redemptions)
Researchers have correctly identified that supervised learning is limited by the data, so some of them want unsupervised learning to work, but they have not figured out how it could succeed. This is one of the key areas to achieve AGI.
Thanks for this video, I wasn't aware of this paper. This channel is becoming really useful for staying up to date on ML news.
fireship aah thumbnail
I've really been baited, I swore this was a fireship video.
Unlock the Data Inside
Turn Videos into Knowledge
- Get FREE 10/day: transcripts, summaries, chats
- Chat with videos, export text & PDF
- $1 free API credit for RAG, chatbots & research
Free forever plan • All features unlocked
Top Comments (10)
I thought this was about an LLM written in Brainfuck. Still neat, but not what I was hoping for
Deep Mind called this since the start. If you train a model with human data it hits a ceiling. RL just refines to get it closer to the ceiling.
Honestly, it sounds like RLVR is still very useful, but not in the role it was intended. A supercharger, not a gearbox or steering wheel.
My intuition says that we’re missing something huge with LLMs. The sheer volume of information doesn’t seem to correspond fit with how stupid they can be sometimes.
5:31 ad ends
Try out Warp 2.0 now, the current rank #1 AI on Terminal Bench, outperforming Claude Code: https://go.warp.dev/bycloud You can also use code "BYCLOUD" to get Warp Pro for 1 month free. (limited for 1,000 redemptions)
Researchers have correctly identified that supervised learning is limited by the data, so some of them want unsupervised learning to work, but they have not figured out how it could succeed. This is one of the key areas to achieve AGI.
Thanks for this video, I wasn't aware of this paper. This channel is becoming really useful for staying up to date on ML news.
fireship aah thumbnail
I've really been baited, I swore this was a fireship video.