New AI Meta: Train LLMs To Explore On "Hard" Tokens [RLVR + Entropy]
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Related videos
A new way to fine-tune LLMs just dropped
bycloud
16.3k views
Mushrooms Have an Ability to Control Rain According to New Study
Anton Petrov
13.2k views
The Most Clever Trick To Speedup LLMs
bycloud
17.7k views
Iran threatens to ‘rain fire’ on American forces | BBC News
BBC News
58.6k views
Chinese DoorDash Is Making Better LLMs Than Meta
bycloud
22.8k views
The RL Irony in LLMs
bycloud
23.0k views
IHIP News: Trump's BJ SCANDAL EXPLODES On Him as MORE Details on BUBBA Are EXPOSED!!
I've Had It
655.3k views
The biggest Mystery of LLMs have just been solved
bycloud
102.6k views
BITCOIN: BOUNCING HARD!!!!! (bulls are back?) whale explains
Ivan on Tech
24.2k views
Exposing Brain Rot To AI
ThePrimeTime
139.2k views
Top Comments (10)
finally somebody using high model entropy to teach them to be more creative, good luck with the military btw
So this is fascinating. Training the model to say "but wait" and similar, forces high entropy tokens into the context, encouraging branches in the reasoning.
9:00 Making LLMs is basically how we learned to organize the Library of Babel.
This seems similar to the training principle of "explorative, then exploitive" that is utilized to avoid overfitting and local minima
Best comment section
Juicy research 🤤
Great explanation as usual! Super exciting developments! So cool that it tackles the repetition AND performance bottlenecks!
dang, now *that's* how you build a language model!
This is amazing! The data the models train on naturally reveals what sequences of words are part of the same "thought" and where do different "thoughts" begin and end. These high entropy tokens are basically delimitations for what we call "chunks" of information. NOW what we must do is delegate the low entropy tokens to the smaller models and only use the BIG BRAIN complex models only for the high entropy tokens. This will maximize efficiency. Also, humans can only hold about 4-7 chunks of information when doing a task. If we can figure out how models can make more that that, we will be golden.
Please make an AI agentic system twin of yourself to post daily
Unlock the Data Inside
Turn Videos into Knowledge
- Get FREE 10/day: transcripts, summaries, chats
- Chat with videos, export text & PDF
- $1 free API credit for RAG, chatbots & research
Free forever plan • All features unlocked
Top Comments (10)
finally somebody using high model entropy to teach them to be more creative, good luck with the military btw
So this is fascinating. Training the model to say "but wait" and similar, forces high entropy tokens into the context, encouraging branches in the reasoning.
9:00 Making LLMs is basically how we learned to organize the Library of Babel.
This seems similar to the training principle of "explorative, then exploitive" that is utilized to avoid overfitting and local minima
Best comment section
Juicy research 🤤
Great explanation as usual! Super exciting developments! So cool that it tackles the repetition AND performance bottlenecks!
dang, now *that's* how you build a language model!
This is amazing! The data the models train on naturally reveals what sequences of words are part of the same "thought" and where do different "thoughts" begin and end. These high entropy tokens are basically delimitations for what we call "chunks" of information. NOW what we must do is delegate the low entropy tokens to the smaller models and only use the BIG BRAIN complex models only for the high entropy tokens. This will maximize efficiency. Also, humans can only hold about 4-7 chunks of information when doing a task. If we can figure out how models can make more that that, we will be golden.
Please make an AI agentic system twin of yourself to post daily