The RL Irony in LLMs
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Related videos
A new way to fine-tune LLMs just dropped
bycloud
16.3k views
THEY’RE IN TROUBLE
Timcast IRL
82.4k views
The Most Clever Trick To Speedup LLMs
bycloud
17.7k views
Why can’t LLMs just LEARN the context window?
bycloud
30.9k views
THEY LOST BIG
Timcast IRL
53.8k views
The Death of RAG?
bycloud
15.0k views
BOOTS on the GROUND in Iran
Ben Shapiro
109.5k views
Kimi K2.5 Brought Us 3 brand NEW LLM Frontier!?
bycloud
22.6k views
LLM’s Billion Dollar Problem
bycloud
44.6k views
Chinese DoorDash Is Making Better LLMs Than Meta
bycloud
22.8k views
Top Comments (10)
Still years away from cheap ram huh?
Wait, you're telling me all that wasn't obvious from the moment lora was made? Also thanks for making a vid on this research, I'm currently almost done writing my bachelor's thesis where I use qlora to finetune base llama 3.1 8B into an unhelpful assistant that does whatever possible to make it seem like it answered your question while providing as little actual help as possible, and this is going to be a fine addition to my bibliography collection
RL is not only for generalization. It is good for making AI learn things where training data is not present
Maybe the problem with RL is that it only rewards the "verifiable reward." It doesn't reward exploration and creativity, which was one of the key breakthroughs for allowing game-playing RL agents previously to clear more of stages that required exploration.
13:28 "Regex fixer" 🤣
The quickest way to answer a question correctly is by already knowing the answer. With LLMs being trained on every written media in existence...
Start learning cyber security with TryHackMe: https://tryhackme.com/bycloud Use my code "BYCLOUD25" to get 25% off on annual subscription! fun fact: I wrote this video on a phone back when i was in the military lol
Lora part of this video was actually practically useful. Together with sources and arguments - great work! Keep it up
5:58 what a choice
Really good quality and informative video, thank you!
Unlock the Data Inside
Turn Videos into Knowledge
- Get FREE 10/day: transcripts, summaries, chats
- Chat with videos, export text & PDF
- $1 free API credit for RAG, chatbots & research
Free forever plan • All features unlocked
Top Comments (10)
Still years away from cheap ram huh?
Wait, you're telling me all that wasn't obvious from the moment lora was made? Also thanks for making a vid on this research, I'm currently almost done writing my bachelor's thesis where I use qlora to finetune base llama 3.1 8B into an unhelpful assistant that does whatever possible to make it seem like it answered your question while providing as little actual help as possible, and this is going to be a fine addition to my bibliography collection
RL is not only for generalization. It is good for making AI learn things where training data is not present
Maybe the problem with RL is that it only rewards the "verifiable reward." It doesn't reward exploration and creativity, which was one of the key breakthroughs for allowing game-playing RL agents previously to clear more of stages that required exploration.
13:28 "Regex fixer" 🤣
The quickest way to answer a question correctly is by already knowing the answer. With LLMs being trained on every written media in existence...
Start learning cyber security with TryHackMe: https://tryhackme.com/bycloud Use my code "BYCLOUD25" to get 25% off on annual subscription! fun fact: I wrote this video on a phone back when i was in the military lol
Lora part of this video was actually practically useful. Together with sources and arguments - great work! Keep it up
5:58 what a choice
Really good quality and informative video, thank you!