Why can’t LLMs just LEARN the context window?
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Related videos
I just can’t. No. Please, no.
Luke Beasley
30.9k views
A new way to fine-tune LLMs just dropped
bycloud
16.3k views
The Most Clever Trick To Speedup LLMs
bycloud
17.7k views
The Death of RAG?
bycloud
15.0k views
DeepSeek Just Added Parameters Where There Were None
bycloud
34.5k views
Chinese DoorDash Is Making Better LLMs Than Meta
bycloud
22.8k views
The RL Irony in LLMs
bycloud
23.0k views
The Bearded Man In My Window | Creep Cast
CreepCast
648.4k views
Who Can Scream The Loudest? - H3 Show #213
H3 Podcast
489.9k views
The biggest Mystery of LLMs have just been solved
bycloud
102.6k views
Top Comments (10)
Intresting method. I wonder if they could just train the weight updates into loras and then just have an index/database of different subjects/conversions. It would probably be more efficient that saving the whole context but still be searchable.
5:57 kind of reminds of how our brain works at night in order to commit things to long term memory, so when there's a period that the model isn't getting any input
It all goes in the direction of how out brains works, huh
I thought we would have done something like this long ago, but with short term and long term memory similar to the brain
Check out HubSpot's FREE 2026 Guide to AI Agents: https://clickhubspot.com/3972be
Probably need to assign importance weight to each token and then scale loss by that during remembering, so model learns relevant stuff, but doesn't over-index on trivial information.
It's probably gonna go like: more flexible last layers, LoRAs, importance weighting to memorize only the important things (probably using attention), special harness for "recalling" information
there was Light Mem paper 5 months ago which no one seems to be talking about
It would be funny to see that kind of LLM have to take a short "nap" after every token batch to store its most recent memories.
latest Kimi's Attention Residuals is awesome. Do that next. Model can check past attention dynamically, hence its scales better. Accuracy improved especially for complex tasks.
Unlock the Data Inside
Turn Videos into Knowledge
- Get FREE 10/day: transcripts, summaries, chats
- Chat with videos, export text & PDF
- $1 free API credit for RAG, chatbots & research
Free forever plan • All features unlocked
Top Comments (10)
Intresting method. I wonder if they could just train the weight updates into loras and then just have an index/database of different subjects/conversions. It would probably be more efficient that saving the whole context but still be searchable.
5:57 kind of reminds of how our brain works at night in order to commit things to long term memory, so when there's a period that the model isn't getting any input
It all goes in the direction of how out brains works, huh
I thought we would have done something like this long ago, but with short term and long term memory similar to the brain
Check out HubSpot's FREE 2026 Guide to AI Agents: https://clickhubspot.com/3972be
Probably need to assign importance weight to each token and then scale loss by that during remembering, so model learns relevant stuff, but doesn't over-index on trivial information.
It's probably gonna go like: more flexible last layers, LoRAs, importance weighting to memorize only the important things (probably using attention), special harness for "recalling" information
there was Light Mem paper 5 months ago which no one seems to be talking about
It would be funny to see that kind of LLM have to take a short "nap" after every token batch to store its most recent memories.
latest Kimi's Attention Residuals is awesome. Do that next. Model can check past attention dynamically, hence its scales better. Accuracy improved especially for complex tasks.