Why can’t LLMs just LEARN the context window?

2026-03-30 Science & Technology

30.9k

1.6k

125

Watch on YouTube

bycloud

229.0k subscribers

Description

Check out HubSpot's FREE 2026 Guide to AI Agents: https://clickhubspot.com/3972be In this video, I'll be breaking down a new approach to long-context LLMs called test-time training (TTT-E2E), where models store past context directly in their weights instead of relying on attention or KV caches. Kind of like meta learning, but with gradient descent. my latest project: Intuitive AI Academy We just wrote a new piece on MoE! https://intuitiveai.academy/ limited time code "EARLY" for 40% off yearly plan! TTT-E2E [Paper] https://arxiv.org/abs/2512.23675 Appeared papers [Titans] https://arxiv.org/abs/2501.00663 [Kimi Linear] https://arxiv.org/abs/2510.26692 My Newsletter https://mail.bycloud.ai/ My Patreon https://www.patreon.com/c/bycloud Try out my new fav place to learn how to code https://scrimba.com/?via=bycloudAI This video is supported by the kind Patrons & YouTube Members: 🙏Spam Maj, Alex, Chris LeDoux, DX Research Group, Poof N' Inu, Deagan, Robert Zawiasa, Ryszard Warzocha, Tobe2d, Louis Muk, Akkusativ, Kevin Tai, Mark Buckler, NO U, Tony Jimenez, Ângelo Fonseca, jiye, Anushka, Asad Dhamani, Binnie Yiu, Calvin Yan, Clayton Ford, Diego Silva, Etrotta, Gonzalo Fidalgo, Handenon, Hector, Jake Disco very, Michael Brenner, Nilly K, OlegWock, Daddy Wen, Shuhong Chen, Sid_Cipher, Stefan Lorenz, Sup, tantan assawade, Thipok Tham, Thomas Di Martino, Thomas Lin, Richárd Nagyfi, Paperboy, mika, Leo, Berhane-Meskel, Kadhai Pesalam, mayssam, Bill Mangrum, nyaa, Toru Mon, Lame Plane, Matej Macak, Len Mo, saylikhapekar, Zyansheep [Discord] https://discord.gg/NhJZGtH [Twitter] https://twitter.com/bycloudai [Patreon] https://www.patreon.com/bycloud [Business Inquiries] [email protected] [Profile & Banner Art] https://twitter.com/pygm7 [Video Editor] @Booga04 [Ko-fi] https://ko-fi.com/bycloudai

#bycloud #bycloudai #learning at test time #test time learning #test time training #meta learning #llm meta learning #llm in context learning

Top Comments (10)

@jibcot8541 2026-03-30

Intresting method. I wonder if they could just train the weight updates into loras and then just have an index/database of different subjects/conversions. It would probably be more efficient that saving the whole context but still be searchable.

93 12 replies

@rany615 2026-03-30

5:57 kind of reminds of how our brain works at night in order to commit things to long term memory, so when there's a period that the model isn't getting any input

78 9 replies

@IDNKEK 2026-03-30

It all goes in the direction of how out brains works, huh

71 5 replies

@alkeryn1700 2026-03-30

next up "attention isn't something you need".

58 1 replies

@miniminerx 2026-03-30

I thought we would have done something like this long ago, but with short term and long term memory similar to the brain

35 3 replies

@bycloudAI 2026-03-30

Check out HubSpot's FREE 2026 Guide to AI Agents: https://clickhubspot.com/3972be

8 1 replies

@Rizhiy13 2026-03-31

Probably need to assign importance weight to each token and then scale loss by that during remembering, so model learns relevant stuff, but doesn't over-index on trivial information.

@TheLiverX 2026-03-31

It's probably gonna go like: more flexible last layers, LoRAs, importance weighting to memorize only the important things (probably using attention), special harness for "recalling" information

@marleymomo9582 2026-04-01

latest Kimi's Attention Residuals is awesome. Do that next. Model can check past attention dynamically, hence its scales better. Accuracy improved especially for complex tasks.

@dustin.odaffer 2026-04-19

best channel for learning LLM mechanics! the practical examples like real chatGPT UI are excellent points of reference.

Description

Top Comments (10)

@jibcot8541 2026-03-30

93 12 replies

@rany615 2026-03-30

5:57 kind of reminds of how our brain works at night in order to commit things to long term memory, so when there's a period that the model isn't getting any input

78 9 replies

@IDNKEK 2026-03-30

It all goes in the direction of how out brains works, huh

71 5 replies

@alkeryn1700 2026-03-30

next up "attention isn't something you need".

58 1 replies

@miniminerx 2026-03-30

I thought we would have done something like this long ago, but with short term and long term memory similar to the brain

35 3 replies

@bycloudAI 2026-03-30

Check out HubSpot's FREE 2026 Guide to AI Agents: https://clickhubspot.com/3972be

8 1 replies

@Rizhiy13 2026-03-31

Probably need to assign importance weight to each token and then scale loss by that during remembering, so model learns relevant stuff, but doesn't over-index on trivial information.

@TheLiverX 2026-03-31

It's probably gonna go like: more flexible last layers, LoRAs, importance weighting to memorize only the important things (probably using attention), special harness for "recalling" information

@marleymomo9582 2026-04-01

latest Kimi's Attention Residuals is awesome. Do that next. Model can check past attention dynamically, hence its scales better. Accuracy improved especially for complex tasks.

@dustin.odaffer 2026-04-19

best channel for learning LLM mechanics! the practical examples like real chatGPT UI are excellent points of reference.

Unlock the Data Inside
Turn Videos into Knowledge

Get FREE 10/day: transcripts, summaries, chats
Chat with videos, export text & PDF
$1 free API credit for RAG, chatbots & research

Try it free

Free forever plan • All features unlocked

Why can’t LLMs just LEARN the context window?

Description

Top Comments (10)

Related videos

I just can’t. No. Please, no.

A new way to fine-tune LLMs just dropped

The Most Clever Trick To Speedup LLMs

The Death of RAG?

DeepSeek Just Added Parameters Where There Were None

Chinese DoorDash Is Making Better LLMs Than Meta

The RL Irony in LLMs

The Bearded Man In My Window | Creep Cast

Who Can Scream The Loudest? - H3 Show #213

The biggest Mystery of LLMs have just been solved

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Related videos

I just can’t. No. Please, no.

A new way to fine-tune LLMs just dropped

The Most Clever Trick To Speedup LLMs

The Death of RAG?

DeepSeek Just Added Parameters Where There Were None

Chinese DoorDash Is Making Better LLMs Than Meta

The RL Irony in LLMs

The Bearded Man In My Window | Creep Cast

Who Can Scream The Loudest? - H3 Show #213

The biggest Mystery of LLMs have just been solved

Description

Top Comments (10)

Unlock the Data Inside
Turn Videos into Knowledge

Why can’t LLMs just LEARN the context window?

Description

Top Comments (10)

Related videos

I just can’t. No. Please, no.

A new way to fine-tune LLMs just dropped

The Most Clever Trick To Speedup LLMs

The Death of RAG?

DeepSeek Just Added Parameters Where There Were None

Chinese DoorDash Is Making Better LLMs Than Meta

The RL Irony in LLMs

The Bearded Man In My Window | Creep Cast

Who Can Scream The Loudest? - H3 Show #213

The biggest Mystery of LLMs have just been solved

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Related videos

I just can’t. No. Please, no.

A new way to fine-tune LLMs just dropped

The Most Clever Trick To Speedup LLMs

The Death of RAG?

DeepSeek Just Added Parameters Where There Were None

Chinese DoorDash Is Making Better LLMs Than Meta

The RL Irony in LLMs

The Bearded Man In My Window | Creep Cast

Who Can Scream The Loudest? - H3 Show #213

The biggest Mystery of LLMs have just been solved

Description

Top Comments (10)

Unlock the Data Inside Turn Videos into Knowledge

Unlock the Data Inside
Turn Videos into Knowledge