How did a 27M Model even beat ChatGPT?
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Related videos
How DeepSeek V4 Broke AI’s Cost Curse
bycloud
101.8k views
PewDiePie beat chatGPT?
The PrimeTime
73.1k views
I Trained My Own AI... It beat ChatGPT
PewDiePie
1.1m views
Did ChatGPT Just Kill Nano Banana?
Matt Wolfe
37.8k views
GPT-5.2 is the best model ever made*
Theo - t3․gg
100.4k views
DeepSeek V3.2 Just Broke SoTA Again… But How?
bycloud
168.2k views
Is gpt-5.1 the best code model ever?
Theo - t3․gg
63.2k views
Did ChatGPT Just Kill Zapier?
Matt Wolfe
358.8k views
How to Fall Back Asleep FAST (Even at 2AM)
Dr. Eric Berg DC
567.9k views
Did gpt-5 just shadow drop? Horizon is the best code model ever
Theo - t3․gg
105.3k views
Top Comments (10)
Basically we rediscover narrow Ai in search of AGI
This is what Moores law should be doing with AI, make them smaller and more efficient instead of throwing money at the problem and pretending that expanding is the way to go
LSTM -> transformer -> transformers with thinking (recursion in output tokens) -> HRM (recursion in internal state) we've almost come full circle
Actually it turned out that there was nothing novel about the HRM. Ablation studies revealed that the hierarchical part makes no difference. Small models trained for specific tasks tend to outperform general purpose models. This is not a new discovery.
it turns out knowing ~100 physics equations is more efficient than fitting reality to a 100B parameter polynomial
I really hope that we get sub 100M params models that are good in 1 programming language. Essentially being able to toggle them like languages in IDEs while running on 20-40 TOPS NPU on laptop processors.
You won't believe it, but the 37-parameters MLP beats the GPT-5.1 in sin(X) calculations.
As hinted at in the video, TRMs don't seem to scale. So instead of making the TRM larger, it would be better to create a MoE with many TRMs as the experts. 1000 TRMs would be a 7b MoE, which changes the problem to one of how to decompose tasks completely (there was a recent paper about that, but I forgot the title).
oh no he got kidnapped at the end
Check out HubSpot's AI Decoded Guide: https://clickhubspot.com/c7a843
Unlock the Data Inside
Turn Videos into Knowledge
- Get FREE 10/day: transcripts, summaries, chats
- Chat with videos, export text & PDF
- $1 free API credit for RAG, chatbots & research
Free forever plan • All features unlocked
Top Comments (10)
Basically we rediscover narrow Ai in search of AGI
This is what Moores law should be doing with AI, make them smaller and more efficient instead of throwing money at the problem and pretending that expanding is the way to go
LSTM -> transformer -> transformers with thinking (recursion in output tokens) -> HRM (recursion in internal state) we've almost come full circle
Actually it turned out that there was nothing novel about the HRM. Ablation studies revealed that the hierarchical part makes no difference. Small models trained for specific tasks tend to outperform general purpose models. This is not a new discovery.
it turns out knowing ~100 physics equations is more efficient than fitting reality to a 100B parameter polynomial
I really hope that we get sub 100M params models that are good in 1 programming language. Essentially being able to toggle them like languages in IDEs while running on 20-40 TOPS NPU on laptop processors.
You won't believe it, but the 37-parameters MLP beats the GPT-5.1 in sin(X) calculations.
As hinted at in the video, TRMs don't seem to scale. So instead of making the TRM larger, it would be better to create a MoE with many TRMs as the experts. 1000 TRMs would be a 7b MoE, which changes the problem to one of how to decompose tasks completely (there was a recent paper about that, but I forgot the title).
oh no he got kidnapped at the end
Check out HubSpot's AI Decoded Guide: https://clickhubspot.com/c7a843