Richard Sutton – Father of RL thinks LLMs are a dead end
Why Richard Sutton Sees LLMs as a Dead End: The RL Perspective
Richard Sutton, Turing Award winner, argues that current Large Language Models fail because they cannot learn continually from experience. Discover the fundamental architectural shift required for true artificial intelligence based on reinforcement learning principles.
Short Summary
- Reinforcement Learning (RL), not LLMs, captures the true essence of intelligence: understanding and acting in one's world to achieve goals.
- LLMs are limited by relying on imitation learning derived from static text, which inherently lacks the corrective mechanism of experiencing real-world consequences (ground truth).
- The necessary future paradigm requires architectural changes enabling continual, on-the-fly experiential learning, mirroring biological life cycles.
- The "Bitter Lesson" suggests that methods leveraging massive computation through interaction, rather than pre-loaded human knowledge, will ultimately provide superior scalability.
- This discussion contrasts the LLM/Imitation approach with the fundamental RL paradigm defined by action, sensation, and reward loops.
The interview explores the conceptual gulf between language modeling based on curated data and the reinforcement learning framework focused on goal-driven interaction with reality. Sutton details why current LLMs cannot learn effectively on the job or adapt robustly because they lack an inherent goal structure allowing for surprise and corrective updates based on world events. This sets the stage for discussing the necessity of a new architecture for true general intelligence.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Related videos
Chip design from the bottom up – Reiner Pope
Dwarkesh Patel
25.6k views
David Reich – Bronze Age shock, the Neanderthal puzzle, & farming’s sudden spread
Dwarkesh Patel
29.9k views
THESE are the faces of COWARDS
David Pakman Show
16.2k views
How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope
Dwarkesh Patel
55.4k views
Kash Patel SNAPS at Reporter on Live TV
The Bulwark
126.2k views
The Department of War is making a huge mistake.
Dwarkesh Patel
35.5k views
Dario Amodei — “We are near the end of the exponential”
Dwarkesh Patel
41.0k views
What are we scaling?
Dwarkesh Patel
22.3k views
Sarah Paine – Why Russia Lost the Cold War
Dwarkesh Patel
136.4k views
Ilya Sutskever – We're moving from the age of scaling to the age of research
Dwarkesh Patel
55.6k views
Top Comments (10)
This channel is what I wanted Lex Fridman to be. You're not afraid of sharing your opinions and thought processes in front of great minds like Sutton. I appreciate that so much. It makes for conversations where I actually reflect on my own assumptions on the topic and learn.
I think Sutton's core point on imitation learning is being missed by many people here. He's not just observing that imitation happens, but suggesting it's a result of more foundational learning, rather than the main way intelligence is acquired. His view aligns with intelligence being built on active, goal-driven experience, where the agent learns from trying things and seeing consequences. He's not disagreeing with the existence of imitation, but rather correcting and nudging us to think deeper about what 'learning' means in the most fundamental sense.
Richard's ability to be so contrary in such a casual way... Love it.
For LLMs mimicry is THE terminal goal. For humans mimicry is an intermediate goal
Dwarkesh too LLM-pilled to hear Sutton's perspective
This interview is very revealing about what it is happening in the field today! From one side, Mr. Sutton plays the role of a great thinker, someone in search for a definition of intelligence in a higher and truthful level. On the other side, Mr. Patel represents the current AI hype, trying to justify that LLMs are intelligent machines and have goals. As Mr. Sutton puts so well: "having a goal is the essence of intelligence...and...next token prediction is not a goal, it does not change the world!" Many cheers to Mr. Sutton! 🥳
Richard made many important points. I got a chance to reflect more about his vision after the interview, and I've compiled some thoughts here: https://youtu.be/u3HBJVjpXuw?si=03GJi7doWKYHpb3k
Two men sitting in same movie theater but seeing two completely different movies.
"it's surprising that you can have such a different point of view" is going to be my new go-to dis
It feels like it started as an interview, and then it turned into a debate once Mr. Sutton didn't worship LLMs
Unlock the Data Inside
Turn Videos into Knowledge
- Get FREE 10/day: transcripts, summaries, chats
- Chat with videos, export text & PDF
- $1 free API credit for RAG, chatbots & research
Free forever plan • All features unlocked
Top Comments (10)
This channel is what I wanted Lex Fridman to be. You're not afraid of sharing your opinions and thought processes in front of great minds like Sutton. I appreciate that so much. It makes for conversations where I actually reflect on my own assumptions on the topic and learn.
I think Sutton's core point on imitation learning is being missed by many people here. He's not just observing that imitation happens, but suggesting it's a result of more foundational learning, rather than the main way intelligence is acquired. His view aligns with intelligence being built on active, goal-driven experience, where the agent learns from trying things and seeing consequences. He's not disagreeing with the existence of imitation, but rather correcting and nudging us to think deeper about what 'learning' means in the most fundamental sense.
Richard's ability to be so contrary in such a casual way... Love it.
For LLMs mimicry is THE terminal goal. For humans mimicry is an intermediate goal
Dwarkesh too LLM-pilled to hear Sutton's perspective
This interview is very revealing about what it is happening in the field today! From one side, Mr. Sutton plays the role of a great thinker, someone in search for a definition of intelligence in a higher and truthful level. On the other side, Mr. Patel represents the current AI hype, trying to justify that LLMs are intelligent machines and have goals. As Mr. Sutton puts so well: "having a goal is the essence of intelligence...and...next token prediction is not a goal, it does not change the world!" Many cheers to Mr. Sutton! 🥳
Richard made many important points. I got a chance to reflect more about his vision after the interview, and I've compiled some thoughts here: https://youtu.be/u3HBJVjpXuw?si=03GJi7doWKYHpb3k
Two men sitting in same movie theater but seeing two completely different movies.
"it's surprising that you can have such a different point of view" is going to be my new go-to dis
It feels like it started as an interview, and then it turned into a debate once Mr. Sutton didn't worship LLMs