Home
Channel
bycloud
Why would anyone let LLMs predict 4 tokens at once? Multi-Token Prediction Explained

Why would anyone let LLMs predict 4 tokens at once? Multi-Token Prediction Explained

2025-05-27 Science & Technology

55.8k

2.5k

129

Watch on YouTube

bycloud

229.0k subscribers

Description

Check out LTX Video 13B now and experience the latest video gen breakthrough: https://bit.ly/ltxvbycloud My Newsletter https://mail.bycloud.ai/ my project: find, discover & explain AI research semantically https://findmypapers.ai/ My Patreon https://www.patreon.com/c/bycloud Future Lens [Paper] https://arxiv.org/abs/2311.04897 Multi-Token Prediction [Paper] https://arxiv.org/abs/2404.19737 DeepSeek-V3 [First Paper] https://arxiv.org/abs/2412.19437 [Technical Paper] https://arxiv.org/abs/2505.09343 Try out my new fav place to learn how to code https://scrimba.com/?via=bycloudAI This video is supported by the kind Patrons & YouTube Members: 🙏Nous Research, Chris LeDoux, Ben Shaener, DX Research Group, Poof N' Inu, Andrew Lescelius, Deagan, Robert Zawiasa, Ryszard Warzocha, Tobe2d, Louis Muk, Akkusativ, Kevin Tai, Mark Buckler, NO U, Tony Jimenez, Ângelo Fonseca, jiye, Anushka, Asad Dhamani, Binnie Yiu, Calvin Yan, Clayton Ford, Diego Silva, Etrotta, Gonzalo Fidalgo, Handenon, Hector, Jake Disco very, Michael Brenner, Nilly K, OlegWock, Daddy Wen, Shuhong Chen, Sid_Cipher, Stefan Lorenz, Sup, tantan assawade, Thipok Tham, Thomas Di Martino, Thomas Lin, Richárd Nagyfi, Paperboy, mika, Leo, Berhane-Meskel, Kadhai Pesalam, mayssam, Bill Mangrum, nyaa [Discord] https://discord.gg/NhJZGtH [Twitter] https://twitter.com/bycloudai [Patreon] https://www.patreon.com/bycloud [Business Inquiries] [email protected] [Profile & Banner Art] https://twitter.com/pygm7 [Video Editor] @Booga04 [Ko-fi] https://ko-fi.com/bycloudai

#bycloud #bycloudai #next token prediction #multi-token prediction #multi-token prediction deepsek #multi-token prediction explained #MTP explained #deepseek MTP

Top Comments (10)

@abhijeet1472-handle 2025-05-27

Stable diffusion LLM

242 5 replies

@hjups 2025-05-27

Your origin paper is incorrect, although you did show / mention it later in the video. The first paper to propose MTP was "Fast Inference from Transformers via Speculative Decoding" (2023), where the entire model predicts N future tokens (context + N-1 * mask -> N). MTP modified this by training a series of smaller models (heads) to predict the next N tokens from a single input (context -> N). DeepSeek then turned these heads into a transformer model with causal masking (still context -> N), which essentially treats the prediction like a RNN. In all cases though, the prediction is speculative, and might be discarded using the same method from the paper I mentioned. Furthermore, this technique has clearly been used by OpenAI since GPT 3.5, and is why the canvas editor in ChatGPT is so fast to make changes (they speculate on the previous canvas state - there's API documentation to support this). My guess is that this isn't being talked about much because it's orthogonal to other advancements, just like quantization is. Also, you never completed the initial motivation. Can DeepSeek's MTP method correctly guess the number of words in a sentence? Probably only slightly better than the next-token prediction objective. The diffusion methods should solve that, but are susceptible to block generation artifacts. Edit: I apologize if this comment came off as harsh, that wasn't my intention.

221 17 replies

@ihysc4370 2025-05-27

Bycloud and AI Explained are the GOATS.

65 4 replies

@diegoantoniorosariopalomin2206 2025-05-27

Even better than diffusion over tokens. Would be diffusion in a latent space. Kind of like using a text autoencoder, and doing diffusion on that space

62 3 replies

@heys3th 2025-05-27

Can't wait for the Gemini diffusion video :D

@Jiftoo 2025-05-27

0:51 Diffusion isn't an architecture. In fact, diffusion LLMs still use transformers.

@simeonnnnn 2025-05-27

Now I'm even more excited for V4 and R2

23 2 replies

@bycloudAI 2025-05-26

Check out LTX Video 13B now and experience the latest video gen breakthrough: https://bit.ly/ltxvbycloud

21 1 replies

@JorgetePanete 2025-05-28

My favourite podcast channel (visuals are plain brainrot)

@AnotherFreakingDude 2025-05-27

That moment when Will Smith eating spaghetti is a Video Gen AI benchmark.

Description

Top Comments (10)

@abhijeet1472-handle 2025-05-27

Stable diffusion LLM

242 5 replies

@hjups 2025-05-27

221 17 replies

@ihysc4370 2025-05-27

Bycloud and AI Explained are the GOATS.

65 4 replies

@diegoantoniorosariopalomin2206 2025-05-27

Even better than diffusion over tokens. Would be diffusion in a latent space. Kind of like using a text autoencoder, and doing diffusion on that space

62 3 replies

@heys3th 2025-05-27

Can't wait for the Gemini diffusion video :D

@Jiftoo 2025-05-27

0:51 Diffusion isn't an architecture. In fact, diffusion LLMs still use transformers.

@simeonnnnn 2025-05-27

Now I'm even more excited for V4 and R2

23 2 replies

@bycloudAI 2025-05-26

Check out LTX Video 13B now and experience the latest video gen breakthrough: https://bit.ly/ltxvbycloud

21 1 replies

@JorgetePanete 2025-05-28

My favourite podcast channel (visuals are plain brainrot)

@AnotherFreakingDude 2025-05-27

That moment when Will Smith eating spaghetti is a Video Gen AI benchmark.

Unlock the Data Inside
Turn Videos into Knowledge

Get FREE 10/day: transcripts, summaries, chats
Chat with videos, export text & PDF
$1 free API credit for RAG, chatbots & research

Try it free

Free forever plan • All features unlocked

Why would anyone let LLMs predict 4 tokens at once? Multi-Token Prediction Explained

Description

Top Comments (10)

Related videos

DSpark: DeepSeek-V4's Insane Compute Optimization Explained

A Year Into Making LLMs, and now Topped Open Source SoTA?!

A new way to fine-tune LLMs just dropped

What Is Yann LeCun Cooking? JEPA Explained Simply

The Only Reason Why The INSANE AI Datacenter Build Out Would Make Sense

IT Welcome To Derry Ending Explained | Season 2 Theories, Book Predictions & Your Questions Answered

STRANGER THINGS Season 5 Vol 2 and 3 | Ending Theories & Predictions Explained

BITCOIN: DECEMBER PRICE PREDICTION - whale explains

New AI Meta: Train LLMs To Explore On "Hard" Tokens [RLVR + Entropy]

10x Faster Than Standard LLM!? DiffusionLM Explained

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Related videos

DSpark: DeepSeek-V4's Insane Compute Optimization Explained

A Year Into Making LLMs, and now Topped Open Source SoTA?!

A new way to fine-tune LLMs just dropped

What Is Yann LeCun Cooking? JEPA Explained Simply

The Only Reason Why The INSANE AI Datacenter Build Out Would Make Sense

IT Welcome To Derry Ending Explained | Season 2 Theories, Book Predictions & Your Questions Answered

STRANGER THINGS Season 5 Vol 2 and 3 | Ending Theories & Predictions Explained

BITCOIN: DECEMBER PRICE PREDICTION - whale explains

New AI Meta: Train LLMs To Explore On "Hard" Tokens [RLVR + Entropy]

10x Faster Than Standard LLM!? DiffusionLM Explained

Description

Top Comments (10)

Unlock the Data Inside
Turn Videos into Knowledge

Why would anyone let LLMs predict 4 tokens at once? Multi-Token Prediction Explained

Description

Top Comments (10)

Related videos

DSpark: DeepSeek-V4's Insane Compute Optimization Explained

A Year Into Making LLMs, and now Topped Open Source SoTA?!

A new way to fine-tune LLMs just dropped

What Is Yann LeCun Cooking? JEPA Explained Simply

The Only Reason Why The INSANE AI Datacenter Build Out Would Make Sense

IT Welcome To Derry Ending Explained | Season 2 Theories, Book Predictions & Your Questions Answered

STRANGER THINGS Season 5 Vol 2 and 3 | Ending Theories & Predictions Explained

BITCOIN: DECEMBER PRICE PREDICTION - whale explains

New AI Meta: Train LLMs To Explore On "Hard" Tokens [RLVR + Entropy]

10x Faster Than Standard LLM!? DiffusionLM Explained

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Related videos

DSpark: DeepSeek-V4's Insane Compute Optimization Explained

A Year Into Making LLMs, and now Topped Open Source SoTA?!

A new way to fine-tune LLMs just dropped

What Is Yann LeCun Cooking? JEPA Explained Simply

The Only Reason Why The INSANE AI Datacenter Build Out Would Make Sense

IT Welcome To Derry Ending Explained | Season 2 Theories, Book Predictions & Your Questions Answered

STRANGER THINGS Season 5 Vol 2 and 3 | Ending Theories & Predictions Explained

BITCOIN: DECEMBER PRICE PREDICTION - whale explains

New AI Meta: Train LLMs To Explore On "Hard" Tokens [RLVR + Entropy]

10x Faster Than Standard LLM!? DiffusionLM Explained

Description

Top Comments (10)

Unlock the Data Inside Turn Videos into Knowledge

Unlock the Data Inside
Turn Videos into Knowledge