Navigate Select ESC Close

The Most Clever Trick To Speedup LLMs

2026-04-01 Science & Technology
17.7k
1.1k
65
bycloud
bycloud
225.0k subscribers

Unlock all features

FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.

Description

Try out and get your free credits now on GenSpark AI, as well as unlimited use of AI Chat and AI Image in 2026 for paid users using this link: https://www.genspark.ai/?utm_source=yt&utm_campaign=bycloudAI In this video, let us demystify one of the coolest trick that researchers use to speedup LLMs so that we can now use it at 2-3x the speed with no drawbacks at all! my latest project: Intuitive AI Academy We just wrote a new piece on Distillation & MoE! https://intuitiveai.academy/ limited time code "EARLY" for 40% off yearly plan! My Newsletter https://mail.bycloud.ai/ My Patreon https://www.patreon.com/c/bycloud Speculative Decoding [Paper 1] https://arxiv.org/pdf/2211.17192 [Paper 2] https://arxiv.org/pdf/2302.01318 Speculative Speculative Decoding [Paper] https://arxiv.org/abs/2603.03251 Try out my new fav place to learn how to code https://scrimba.com/?via=bycloudAI This video is supported by the kind Patrons & YouTube Members: 🙏Spam Maj, Alex, Chris LeDoux, DX Research Group, Poof N' Inu, Deagan, Robert Zawiasa, Ryszard Warzocha, Tobe2d, Louis Muk, Akkusativ, Kevin Tai, Mark Buckler, NO U, Tony Jimenez, Ângelo Fonseca, jiye, Anushka, Asad Dhamani, Binnie Yiu, Calvin Yan, Clayton Ford, Diego Silva, Etrotta, Gonzalo Fidalgo, Handenon, Hector, Jake Disco very, Michael Brenner, Nilly K, OlegWock, Daddy Wen, Shuhong Chen, Sid_Cipher, Stefan Lorenz, Sup, tantan assawade, Thipok Tham, Thomas Di Martino, Thomas Lin, Richárd Nagyfi, Paperboy, mika, Leo, Berhane-Meskel, Kadhai Pesalam, mayssam, Bill Mangrum, nyaa, Toru Mon, Lame Plane, Matej Macak, Len Mo, saylikhapekar, ZyanSheep [Discord] https://discord.gg/NhJZGtH [Twitter] https://twitter.com/bycloudai [Patreon] https://www.patreon.com/bycloud [Business Inquiries] [email protected] [Profile & Banner Art] https://twitter.com/pygm7 [Video Editor] [Ko-fi] https://ko-fi.com/bycloudai

Top Comments (10)

@dukemagus 2026-04-01

Even if it is an April fool's joke, there are already three startups founded and raising millions on funding rounds based on it

103 7 replies
@OperationDarkside 2026-04-01

Just wait for my speculative speculative speculative speculative decoding paper, where I cram increasingly smaller models into every crevice I can find.

93 1 replies
@phantoslayer9332 2026-04-01

THIS IS NOT AN APRIL FOOLS DAY JOKE, speculative speculative decoding is real and is valid.

57 2 replies
@SandTiger42 2026-04-01

This April Fools day joke is the worst. I was going to say "Jokes on you, I ONLY have a small model." Then I realized the jokes on me, because *I* do all the correcting on my 24B LLM chatbot. I am the big model fixing it every time. Speculative decoding, the slow human fleshbag way.

56
@parimalarenga92 2026-04-01

wen turbo quant video? , it's like jpeg but for token patterns

30
@nociza 2026-04-01

Waking up to this gift made my day

27
@StefanReich 2026-04-01

Man your use of memes is so on point

23
@wolfehtesrever 2026-04-01

I feel like theres a loop here that lets you keep shrinking rhe smaller model

9
@JorgetePanete 2026-04-01

"In how many decode speculations did we win?" "Just one."

9
@bycloudAI 2026-04-01

Check out and get your free credits now on GenSpark AI, as well as unlimited use of AI Chat and AI Image in 2026 for paid users using this link: https://www.genspark.ai/?utm_source=yt&utm_campaign=bycloudAI

7

Unlock the Data Inside
Turn Videos into Knowledge

  • Get FREE 10/day: transcripts, summaries, chats
  • Chat with videos, export text & PDF
  • $1 free API credit for RAG, chatbots & research

Free forever plan • All features unlocked

App screenshot