Home
Channel
Dwarkesh Patel
Richard Sutton – Father of RL thinks LLMs are a dead end

Richard Sutton – Father of RL thinks LLMs are a dead end

2025-09-26 Science & Technology

435.9k

11.3k

2.6k

Watch on YouTube

Dwarkesh Patel

1.4m subscribers

Description

Richard Sutton is the father of reinforcement learning, winner of the 2024 Turing Award, and author of The Bitter Lesson. And he thinks LLMs are a dead end. After interviewing him, my steel man of Richard’s position is this: LLMs aren’t capable of learning on-the-job, so no matter how much we scale, we’ll need *some* new architecture to enable continual learning. And once we have it, we won’t need a special training phase — the agent will just learn on-the-fly, like all humans, and indeed, like all animals. This new paradigm will render our current approach with LLMs obsolete. In our interview, I did my best to represent the view that LLMs might function as the foundation on which experiential learning can happen… Some sparks flew. A big thanks to the Alberta Machine Intelligence Institute for inviting me up to Edmonton and for letting me use their studio and equipment. Enjoy! 𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒 * Transcript: https://www.dwarkesh.com/p/richard-sutton * Apple Podcasts: https://podcasts.apple.com/us/podcast/richard-sutton-father-of-rl-thinks-llms-are-a-dead-end/id1516093381?i=1000728584744 * Spotify: https://open.spotify.com/episode/3zAXRCFrHPShU4MuuIx4V5?si=c9f4bf24fb4c43e3 𝐒𝐏𝐎𝐍𝐒𝐎𝐑𝐒 * Labelbox makes it possible to train AI agents in hyperrealistic RL environments. With an experienced team of applied researchers and a massive network of subject-matter experts, Labelbox ensures your training reflects important, real-world nuance. Turn your demo projects into working systems at https://labelbox.com/dwarkesh * Gemini Deep Research is designed for thorough exploration of hard topics. For this episode, it helped me trace reinforcement learning from early policy gradients up to current-day methods, combining clear explanations with curated examples. Try it out yourself at https://gemini.google.com/ * Hudson River Trading doesn’t silo their teams. Instead, HRT researchers openly trade ideas and share strategy code in a mono-repo. This means you’re able to learn at incredible speed and your contributions have impact across the entire firm. Find open roles at https://hudsonrivertrading.com/dwarkesh To sponsor a future episode, visit https://dwarkesh.com/advertise 𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 00:00:00 – Are LLMs a dead end? 00:13:51 – Do humans do imitation learning? 00:23:57 – The Era of Experience 00:34:25 – Current architectures generalize poorly out of distribution 00:42:17 – Surprises in the AI field 00:47:28 – Will The Bitter Lesson still apply after AGI? 00:54:35 – Succession to AI

Top Comments (10)

@victorjans3771 2025-09-27

This channel is what I wanted Lex Fridman to be. You're not afraid of sharing your opinions and thought processes in front of great minds like Sutton. I appreciate that so much. It makes for conversations where I actually reflect on my own assumptions on the topic and learn.

2.0k 130 replies

@UjjwalQo 2025-09-27

I think Sutton's core point on imitation learning is being missed by many people here. He's not just observing that imitation happens, but suggesting it's a result of more foundational learning, rather than the main way intelligence is acquired. His view aligns with intelligence being built on active, goal-driven experience, where the agent learns from trying things and seeing consequences. He's not disagreeing with the existence of imitation, but rather correcting and nudging us to think deeper about what 'learning' means in the most fundamental sense.

1.5k 157 replies

@oracleofwater 2025-09-26

Richard's ability to be so contrary in such a casual way... Love it.

1.1k 57 replies

@SmileyEmoji42 2025-09-28

For LLMs mimicry is THE terminal goal. For humans mimicry is an intermediate goal

864 93 replies

@DavidJones-kz6ik 2025-09-28

Dwarkesh too LLM-pilled to hear Sutton's perspective

198 9 replies

@patryciabarrospereira7400 2025-12-12

This interview is very revealing about what it is happening in the field today! From one side, Mr. Sutton plays the role of a great thinker, someone in search for a definition of intelligence in a higher and truthful level. On the other side, Mr. Patel represents the current AI hype, trying to justify that LLMs are intelligent machines and have goals. As Mr. Sutton puts so well: "having a goal is the essence of intelligence...and...next token prediction is not a goal, it does not change the world!" Many cheers to Mr. Sutton! 🥳

150 5 replies

@DwarkeshPatel 2025-10-06

Richard made many important points. I got a chance to reflect more about his vision after the interview, and I've compiled some thoughts here: https://youtu.be/u3HBJVjpXuw?si=03GJi7doWKYHpb3k

126 53 replies

@kirkkillion 2025-12-08

Two men sitting in same movie theater but seeing two completely different movies.

62 1 replies

@benr9014 2025-11-08

"it's surprising that you can have such a different point of view" is going to be my new go-to dis

33 1 replies

@hhill5489 2026-01-15

It feels like it started as an interview, and then it turned into a debate once Mr. Sutton didn't worship LLMs

14 2 replies

Why Richard Sutton Sees LLMs as a Dead End: The RL Perspective

Richard Sutton, Turing Award winner, argues that current Large Language Models fail because they cannot learn continually from experience. Discover the fundamental architectural shift required for true artificial intelligence based on reinforcement learning principles.

Short Summary

Reinforcement Learning (RL), not LLMs, captures the true essence of intelligence: understanding and acting in one's world to achieve goals.
LLMs are limited by relying on imitation learning derived from static text, which inherently lacks the corrective mechanism of experiencing real-world consequences (ground truth).
The necessary future paradigm requires architectural changes enabling continual, on-the-fly experiential learning, mirroring biological life cycles.
The "Bitter Lesson" suggests that methods leveraging massive computation through interaction, rather than pre-loaded human knowledge, will ultimately provide superior scalability.
This discussion contrasts the LLM/Imitation approach with the fundamental RL paradigm defined by action, sensation, and reward loops.

The interview explores the conceptual gulf between language modeling based on curated data and the reinforcement learning framework focused on goal-driven interaction with reality. Sutton details why current LLMs cannot learn effectively on the job or adapt robustly because they lack an inherent goal structure allowing for surprise and corrective updates based on world events. This sets the stage for discussing the necessity of a new architecture for true general intelligence.

+ Key Points (unlock)

+ Next Steps (unlock)

+ Chapters (unlock)

+ Glossary (unlock)

+ Claims (unlock)

+ Safety (unlock)

Unlock all features

FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.

Get My 10 Free Today

Description

Top Comments (10)

@victorjans3771 2025-09-27

2.0k 130 replies

@UjjwalQo 2025-09-27

1.5k 157 replies

@oracleofwater 2025-09-26

Richard's ability to be so contrary in such a casual way... Love it.

1.1k 57 replies

@SmileyEmoji42 2025-09-28

For LLMs mimicry is THE terminal goal. For humans mimicry is an intermediate goal

864 93 replies

@DavidJones-kz6ik 2025-09-28

Dwarkesh too LLM-pilled to hear Sutton's perspective

198 9 replies

@patryciabarrospereira7400 2025-12-12

150 5 replies

@DwarkeshPatel 2025-10-06

Richard made many important points. I got a chance to reflect more about his vision after the interview, and I've compiled some thoughts here: https://youtu.be/u3HBJVjpXuw?si=03GJi7doWKYHpb3k

126 53 replies

@kirkkillion 2025-12-08

Two men sitting in same movie theater but seeing two completely different movies.

62 1 replies

@benr9014 2025-11-08

"it's surprising that you can have such a different point of view" is going to be my new go-to dis

33 1 replies

@hhill5489 2026-01-15

It feels like it started as an interview, and then it turned into a debate once Mr. Sutton didn't worship LLMs

14 2 replies

Unlock the Data Inside
Turn Videos into Knowledge

Get FREE 10/day: transcripts, summaries, chats
Chat with videos, export text & PDF
$1 free API credit for RAG, chatbots & research

Try it free

Free forever plan • All features unlocked

Richard Sutton – Father of RL thinks LLMs are a dead end

Description

Top Comments (10)

Related videos

Grant Sanderson (@3blue1brown) – AI and the future of math

The data black hole at the center of AI

Richard Wolff: U.S. Defeat in Iran & End of the U.S Empire

Sarah Paine - Why Russia and China can't escape geography

Chip design from the bottom up – Reiner Pope

David Reich – Bronze Age shock, the Neanderthal puzzle, & farming’s sudden spread

THESE are the faces of COWARDS

How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope

Kash Patel SNAPS at Reporter on Live TV

The Department of War is making a huge mistake.

Why Richard Sutton Sees LLMs as a Dead End: The RL Perspective

Short Summary

+ Key Points (unlock)

+ Next Steps (unlock)

+ Chapters (unlock)

+ Glossary (unlock)

+ Claims (unlock)

+ Safety (unlock)

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Related videos

Grant Sanderson (@3blue1brown) – AI and the future of math

The data black hole at the center of AI

Richard Wolff: U.S. Defeat in Iran & End of the U.S Empire

Sarah Paine - Why Russia and China can't escape geography

Chip design from the bottom up – Reiner Pope

David Reich – Bronze Age shock, the Neanderthal puzzle, & farming’s sudden spread

THESE are the faces of COWARDS

How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope

Kash Patel SNAPS at Reporter on Live TV

The Department of War is making a huge mistake.

Description

Top Comments (10)

Unlock the Data Inside Turn Videos into Knowledge

Unlock the Data Inside
Turn Videos into Knowledge