Navigate Select ESC Close

New AI Meta: Train LLMs To Explore On "Hard" Tokens [RLVR + Entropy]

2025-08-31 Science & Technology
23.4k
1.7k
83
bycloud
bycloud
225.0k subscribers

Unlock all features

FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.

Description

Get started with Strands Agents today: https://aws.amazon.com/blogs/opensource/introducing-strands-agents-1-0-production-ready-multi-agent-orchestration-made-simple/?trk=3fa00bba-d2bc-45e2-a2b0-f96bc12fd521&sc_channel=psm In this video, I will be sharing how researchers train LLMs to "explore" during RL to improve performance via entropy. My Newsletter https://mail.bycloud.ai/ my project: find, discover & explain AI research semantically https://findmypapers.ai/ My Patreon https://www.patreon.com/c/bycloud Beyond the 80/20 Rule [Paper] https://arxiv.org/abs/2506.01939 Reasoning with Exploration [Paper] https://arxiv.org/abs/2506.14758 Try out my new fav place to learn how to code https://scrimba.com/?via=bycloudAI This video is supported by the kind Patrons & YouTube Members: 🙏Nous Research, Chris LeDoux, Ben Shaener, DX Research Group, Poof N' Inu, Andrew Lescelius, Deagan, Robert Zawiasa, Ryszard Warzocha, Tobe2d, Louis Muk, Akkusativ, Kevin Tai, Mark Buckler, NO U, Tony Jimenez, Ângelo Fonseca, jiye, Anushka, Asad Dhamani, Binnie Yiu, Calvin Yan, Clayton Ford, Diego Silva, Etrotta, Gonzalo Fidalgo, Handenon, Hector, Jake Disco very, Michael Brenner, Nilly K, OlegWock, Daddy Wen, Shuhong Chen, Sid_Cipher, Stefan Lorenz, Sup, tantan assawade, Thipok Tham, Thomas Di Martino, Thomas Lin, Richárd Nagyfi, Paperboy, mika, Leo, Berhane-Meskel, Kadhai Pesalam, mayssam, Bill Mangrum, nyaa, Toru Mon [Discord] https://discord.gg/NhJZGtH [Twitter] https://twitter.com/bycloudai [Patreon] https://www.patreon.com/bycloud [Business Inquiries] [email protected] [Profile & Banner Art] https://twitter.com/pygm7 [Video Editor] @Booga04 [Ko-fi] https://ko-fi.com/bycloudai

Top Comments (10)

@gemstone7818 2025-08-31

finally somebody using high model entropy to teach them to be more creative, good luck with the military btw

123 2 replies
@6IGNITION9 2025-09-01

So this is fascinating. Training the model to say "but wait" and similar, forces high entropy tokens into the context, encouraging branches in the reasoning.

47
@Vlarkus 2025-08-31

9:00 Making LLMs is basically how we learned to organize the Library of Babel.

24 10 replies
@Shragenator 2025-08-31

This seems similar to the training principle of "explorative, then exploitive" that is utilized to avoid overfitting and local minima

21
@wenlamboMMD 2025-08-31

Best comment section

14
@technolus5742 2025-08-31

Juicy research 🤤

12
@DeadtomGCthe2nd 2025-08-31

Great explanation as usual! Super exciting developments! So cool that it tackles the repetition AND performance bottlenecks!

11
@arlogodfrey1508 2025-08-31

dang, now *that's* how you build a language model!

9
@vladyskaizen 2025-09-06

This is amazing! The data the models train on naturally reveals what sequences of words are part of the same "thought" and where do different "thoughts" begin and end. These high entropy tokens are basically delimitations for what we call "chunks" of information. NOW what we must do is delegate the low entropy tokens to the smaller models and only use the BIG BRAIN complex models only for the high entropy tokens. This will maximize efficiency. Also, humans can only hold about 4-7 chunks of information when doing a task. If we can figure out how models can make more that that, we will be golden.

8 2 replies
@PrimeStackPro 2025-08-31

Please make an AI agentic system twin of yourself to post daily

5

Unlock the Data Inside
Turn Videos into Knowledge

  • Get FREE 10/day: transcripts, summaries, chats
  • Chat with videos, export text & PDF
  • $1 free API credit for RAG, chatbots & research

Free forever plan • All features unlocked

App screenshot