Navigate Select ESC Close

LLM’s Billion Dollar Problem

2026-02-10 Science & Technology
44.6k
2.3k
256
bycloud
bycloud
225.0k subscribers

Unlock all features

FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.

Description

Check out Inngest and let your AI agents wear a harness now https://innge.st/yt-bycl-1 This video originally was going to be about the Linear Attention Saga that happened between June and November last year, but it turned out I needed quite some build up to explain what the significance of Linear Attention, 1 million context window, and compute scaling are. So it accidentally became a 17 mins video... my latest project: Intuitive AI Academy https://intuitiveai.academy/ limited time code "NYNM" for 50% off forever (only a 25 spots left!) My Newsletter https://mail.bycloud.ai/ my project: find, discover & explain AI research semantically https://findmypapers.ai/ My Patreon https://www.patreon.com/c/bycloud Sauces (will try to be in order of apperance) GPT-OSS [Code] https://github.com/openai/gpt-oss [Paper] https://arxiv.org/abs/2508.10925 (they did not mention it is sliding window attention explicitly in its paper) DeepSeek V3.2 [Paper] https://arxiv.org/pdf/2512.02556 DeepSeek V2 (Multi-head Latent Attention) [Paper] https://arxiv.org/abs/2405.04434 Kimi K2.5 [Paper] https://arxiv.org/abs/2602.02276 MiniMax [Text-01] https://arxiv.org/abs/2501.08313 [M1] https://arxiv.org/abs/2506.13585 [M2] https://huggingface.co/MiniMaxAI/MiniMax-M2 [Zhihu Blog] https://www.zhihu.com/question/1965302088260104295/answer/1966810157473335067 Qwen-3 Next [Project Page] https://qwen.ai/blog?id=4074cca80393150c248e508aa62983f9cb7d27cd Hunyuan T1 [Project Page] https://tencent.github.io/llm.hunyuan.T1/README_EN.html Kimi Linear [Paper] https://arxiv.org/abs/2510.26692 Context Arena (used for comparing long context performance) [Project Page] https://contextarena.ai/ Gemini 3 Flash [Blog] https://blog.google/products-and-platforms/products/gemini/gemini-3-flash/ Claude 4.6 [Blog] https://www.anthropic.com/news/claude-opus-4-6 Try out my new fav place to learn how to code https://scrimba.com/?via=bycloudAI This video is supported by the kind Patrons & YouTube Members: 🙏Spam Maj, Alex, Chris LeDoux, DX Research Group, Poof N' Inu, Deagan, Robert Zawiasa, Ryszard Warzocha, Midwstmakr, Tobe2d, Louis Muk, Akkusativ, Kevin Tai, Mark Buckler, NO U, Tony Jimenez, Ângelo Fonseca, jiye, Anushka, Asad Dhamani, Binnie Yiu, Calvin Yan, Clayton Ford, Diego Silva, Etrotta, Gonzalo Fidalgo, Handenon, Hector, Jake Disco very, Michael Brenner, Nilly K, OlegWock, Daddy Wen, Shuhong Chen, Sid_Cipher, Stefan Lorenz, Sup, tantan assawade, Thipok Tham, Thomas Di Martino, Thomas Lin, Richárd Nagyfi, Paperboy, mika, Leo, Berhane-Meskel, Kadhai Pesalam, mayssam, Bill Mangrum, nyaa, Toru Mon, Lame Plane, Matej Macak, thechoephix [Discord] https://discord.gg/NhJZGtH [Twitter] https://twitter.com/bycloudai [Patreon] https://www.patreon.com/bycloud [Business Inquiries] [email protected] [Music] @IraStoria [Profile & Banner Art] https://twitter.com/pygm7 [Video Editor] Abhay and @Booga04 [Ko-fi] https://ko-fi.com/bycloudai

Top Comments (10)

@SirJerr 2026-02-11

Google was only offering that long context window for the first few weeks after Gemini 3 launch. They quietly rugpulled once the hype was past peak and the exodus from GPT was well under way. Now you only get 32K tokens even with Pro and it's just god awful. Reasoning performance tanked too after Google cut the precision of parameter datatypes to save more money. So, I don't think Google solved anything at all. Their secret sauce was being the biggest and most profitable player in the game, which allowed them to burn who knows how much cash to fool everyone into thinking they'd solved it, and switch over to their inferior product on 6-12 month plans or whatever, leading nVidia to abandon the 100B investment OpenAI was depending on to stay solvent. Honestly it looks like this "innovation" is innovative in the same way that Altman's ploy to buy up all the RAM wafer supplies to engineer the shortage, without actually using those wafers, was. Basically, a scam. So sick of this anti-consumer, anti-competitive, big business bullshit.

363 22 replies
@ThisIsAGoodUserNameToo 2026-02-10

video idea: LLM's billion watt problem

272 8 replies
@MrC0MPUT3R 2026-02-10

I took a shot for every 'linear' in this video. Hello from spirit realm.

239 4 replies
@DoctorMandible 2026-02-10

The best change I made to my AI assisted coding workflow was to limit the project size. This is an known software practice of working in "modules", rather than a monolithic code base. Modules communicate with one another as separate apps. This limits your context needs dramatically.

237 43 replies
@alkeryn1700 2026-02-10

16:40 i think your conclusion about google having solved it may be wrong, don't forget they have their own custom hardware made specificaly for inference.

79 3 replies
@frogg03_ 2026-02-10

We're back to LSTM 😭😭

71 5 replies
@johnsherby9130 2026-02-11

Context usage on frontier models is ridiculous these days. I like Gemini 3 pro but it’s genuinely incapable of basic tasks after maybe 10 minutes of conversation, it’s almost funny sometimes

16
@bycloudAI 2026-02-10

Check out Inngest and let your AI agents wear a harness now https://innge.st/yt-bycl-1

14 8 replies
@jonathonduhon9278 2026-02-14

"Google has solved it, guys!" Google: *Nervously tugs collar while issuing 100 year bonds*

10
@filipriecfilipriec3716 2026-02-10

Oh thats why flash 3 was trained as an olympic regabaiter. Its such a good model, that is sooo annoying

7

Unlock the Data Inside
Turn Videos into Knowledge

  • Get FREE 10/day: transcripts, summaries, chats
  • Chat with videos, export text & PDF
  • $1 free API credit for RAG, chatbots & research

Free forever plan • All features unlocked

App screenshot