OpenAI o1's New Paradigm: Test-Time Compute Explained
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Related videos
What Is Yann LeCun Cooking? JEPA Explained Simply
bycloud
50.7k views
Earth’s Core Should Be Impossible. A New State of Matter Explains It.
PBS Space Time
346.9k views
DeepSeek's Insane Architecture Breakthrough [Engram Explained]
bycloud
71.3k views
OpenAI's New Era
The PrimeTime
217.3k views
Black Holes. Explained. For 1.5 Hours.
PBS Space Time
94.2k views
NotebookLM's Biggest Updates Yet - Every New Feature Explained
Futurepedia
154.4k views
OpenAI's Code Red, Sacks vs New York Times, New Poverty Line?
All-In Podcast
172.5k views
OpenAI’s Code Red Explained
Matt Wolfe
60.0k views
The 7 Most Common Magnesium Types, Explained
Paul Saladino MD
46.7k views
New AI Meta: Train LLMs To Explore On "Hard" Tokens [RLVR + Entropy]
bycloud
23.4k views
Top Comments (10)
One of the chain of thoughts felt like doing an A* search on all possible answers
Your channel is like twitter but only the good part, I love it
OpenAI went from extremely secretive closed-source for profit to even more secretive closed-source for profit. Truly revolutionary change.
I don't understand why you're so insistent that using RL to learn reasoning can't cause new knowledge to be gained. You're implicitly assuming that if the model knows A and that A implies B then the model must already know B. But that's not true. The model knows the rules of chess, and these rules imply whatever the optimal strategy is, but it definitely doesn't know this optimal strategy. It may come to learn it (or of approximations of it) through RL, though, as alpha zero and similar did.
Let me know if you guys want a dive into the methodologies of TTC, there's a lot of new papers/implementations coming out every day lol (entropix is a cool one) Check out NVIDIA's suite of Training and Certification here: [NVIDIA Certification] https://nvda.ws/3XxkFyj [AI Learning Essential] https://nvda.ws/4gvD474 [Gen AI/LLM Learning Path] https://nvda.ws/4enwYE7 You can use the code “BYCLOUD” at checkout for 10% off!
Glad to see the original editing approach back.
Fun fact: I have spent 3-4 days trying to fix a single SQLite bug while I was debugging with AI
"Bart say the line!" *Sigh* "The bitter lesson strikes again"
kinda reminds me of how chess bots like stockfish are able to view multiple potential outcomes to find the best move possible
Okay this explains why higher temp and top_p give better results sometime😮
Unlock the Data Inside
Turn Videos into Knowledge
- Get FREE 10/day: transcripts, summaries, chats
- Chat with videos, export text & PDF
- $1 free API credit for RAG, chatbots & research
Free forever plan • All features unlocked
Top Comments (10)
One of the chain of thoughts felt like doing an A* search on all possible answers
Your channel is like twitter but only the good part, I love it
OpenAI went from extremely secretive closed-source for profit to even more secretive closed-source for profit. Truly revolutionary change.
I don't understand why you're so insistent that using RL to learn reasoning can't cause new knowledge to be gained. You're implicitly assuming that if the model knows A and that A implies B then the model must already know B. But that's not true. The model knows the rules of chess, and these rules imply whatever the optimal strategy is, but it definitely doesn't know this optimal strategy. It may come to learn it (or of approximations of it) through RL, though, as alpha zero and similar did.
Let me know if you guys want a dive into the methodologies of TTC, there's a lot of new papers/implementations coming out every day lol (entropix is a cool one) Check out NVIDIA's suite of Training and Certification here: [NVIDIA Certification] https://nvda.ws/3XxkFyj [AI Learning Essential] https://nvda.ws/4gvD474 [Gen AI/LLM Learning Path] https://nvda.ws/4enwYE7 You can use the code “BYCLOUD” at checkout for 10% off!
Glad to see the original editing approach back.
Fun fact: I have spent 3-4 days trying to fix a single SQLite bug while I was debugging with AI
"Bart say the line!" *Sigh* "The bitter lesson strikes again"
kinda reminds me of how chess bots like stockfish are able to view multiple potential outcomes to find the best move possible
Okay this explains why higher temp and top_p give better results sometime😮