Home
Channel
bycloud
1 Million Tiny Experts in an AI? Fine-Grained MoE Explained

1 Million Tiny Experts in an AI? Fine-Grained MoE Explained

2024-07-31 Science & Technology

54.6k

2.9k

168

Watch on YouTube

bycloud

229.0k subscribers

Description

To try everything Brilliant has to offer—free—for a full 30 days, visit https://brilliant.org/bycloud/ . You’ll also get 20% off an annual premium subscription. Mixture of Experts explained, well, re-explained. We are in the Fine-Grain era of Mixture of Experts and it's about to get even more interesting as we further scale it up. This video was sponsored by Brilliant Check out my newsletter: https://mail.bycloud.ai Special thanks to LDJ for helping me with this video Mixtral 8x7B Paper [Paper] https://arxiv.org/abs/2401.04088 Sparse MoE (2017) [Paper] https://arxiv.org/abs/1701.06538 Adaptive Mixtures of Local Experts (1991) [Paper] https://direct.mit.edu/neco/article-abstract/3/1/79/5560/Adaptive-Mixtures-of-Local-Experts?redirectedFrom=fulltext Gshard [Paper] https://arxiv.org/pdf/2006.16668 Branch-Train Mix [Paper] https://arxiv.org/pdf/2403.07816 DeepSeek-MoE [Paper] https://arxiv.org/abs/2401.06066 MoWE (from the meme at 7:51) [Paper] https://arxiv.org/abs/2311.10768 Mixture of A Million Experts [Paper] https://web3.arxiv.org/abs/2407.04153 This video is supported by the kind Patrons & YouTube Members: 🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Robert Zawiasa, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi, Hector, Drexon, Claxvii 177th, Inferencer, Michael Brenner, Akkusativ, Oleg Wock, FantomBloth [Discord] https://discord.gg/NhJZGtH [Twitter] https://twitter.com/bycloudai [Patreon] https://www.patreon.com/bycloud [Music] massobeats - daydream [Profile & Banner Art] https://twitter.com/pygm7 [Video Editor] @Askejm

#bycloud #bycloudai #mixture of experts #MoE Explained #mixture of experts explained #mixture of a million experts #branch train mix #deepseek

Top Comments (10)

@pro100gameryt8 2024-07-31

Imagine assembling 1 milliont PhD students together to discuss someone's request like "write a poem about cooking eggs with c++". Thats MoE irl

414 17 replies

@gemstone7818 2024-07-31

to some extent this seems closer to how brains work

163 10 replies

@randomlettersqzkebkw 2024-07-31

i see what you did there with "catastrophic forgetting" lmao 🤣

60 1 replies

@GeoMeridium 2024-08-02

It's crazy how Meta's 8B parameter Llama 3 model has nearly the same performance as the original GPT-4 with 1.8T parameters. That's a 225x reduction in compute in just 2 years.

55 1 replies

@sorakagodess 2024-08-01

The only thing in my mind is "MoE moe kyuuuuun!!!"

50 1 replies

@Quantum_Nebula 2024-08-01

Now I really am excited for a 800B model with fine-grained MoE to surface that I can run on basically any device.

16 1 replies

@bycloudAI 2024-07-31

To try everything Brilliant has to offer—free—for a full 30 days, visit https://brilliant.org/bycloud/ . You’ll also get 20% off an annual premium subscription! Like this comment if you wanna see more MoE related content, I have quite a good list for a video;)

11 2 replies

@AkysChannel 2024-08-01

These videos format is GOLD 🏆 such specific and nerdy topics produced as memes 😄

@lazyalpaca7 2024-07-31

3:37 wasn't it just yesterday that they released their model 😭

@simeonnnnn 2024-07-31

Damn.. You blew my mind on the 1 million experts and Forever learning thing

Description

Top Comments (10)

@pro100gameryt8 2024-07-31

Imagine assembling 1 milliont PhD students together to discuss someone's request like "write a poem about cooking eggs with c++". Thats MoE irl

414 17 replies

@gemstone7818 2024-07-31

to some extent this seems closer to how brains work

163 10 replies

@randomlettersqzkebkw 2024-07-31

i see what you did there with "catastrophic forgetting" lmao 🤣

60 1 replies

@GeoMeridium 2024-08-02

It's crazy how Meta's 8B parameter Llama 3 model has nearly the same performance as the original GPT-4 with 1.8T parameters. That's a 225x reduction in compute in just 2 years.

55 1 replies

@sorakagodess 2024-08-01

The only thing in my mind is "MoE moe kyuuuuun!!!"

50 1 replies

@Quantum_Nebula 2024-08-01

Now I really am excited for a 800B model with fine-grained MoE to surface that I can run on basically any device.

16 1 replies

@bycloudAI 2024-07-31

11 2 replies

@AkysChannel 2024-08-01

These videos format is GOLD 🏆 such specific and nerdy topics produced as memes 😄

@lazyalpaca7 2024-07-31

3:37 wasn't it just yesterday that they released their model 😭

@simeonnnnn 2024-07-31

Damn.. You blew my mind on the 1 million experts and Forever learning thing

Unlock the Data Inside
Turn Videos into Knowledge

Get FREE 10/day: transcripts, summaries, chats
Chat with videos, export text & PDF
$1 free API credit for RAG, chatbots & research

Try it free

Free forever plan • All features unlocked

1 Million Tiny Experts in an AI? Fine-Grained MoE Explained

Description

Top Comments (10)

Related videos

FDE: The $1M/Year AI Job Explained

DSpark: DeepSeek-V4's Insane Compute Optimization Explained

Firecrawl AI clearly explained (and how to make $$)

DeepSeek's Insane Architecture Breakthrough [Engram Explained]

An Expert Exploration Into St. Thomas Aquinas | Fr. Gregory Pine

The Three Million NEW Pages Of Epstein Files EXPLAINED

$50 Million Gone in Seconds… From One Tiny Mistake

NY AG Indicted For FRAUD, Faces 30 Years In Prison, $1 MILLION FINE | Timcast IRL

Ivan the Terrible and his Son Ivan by Ilya Repin: Great Art Explained

Mysterious Craters in Siberia May Be Finally Explained

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Related videos

FDE: The $1M/Year AI Job Explained

DSpark: DeepSeek-V4's Insane Compute Optimization Explained

Firecrawl AI clearly explained (and how to make $$)

DeepSeek's Insane Architecture Breakthrough [Engram Explained]

An Expert Exploration Into St. Thomas Aquinas | Fr. Gregory Pine

The Three Million NEW Pages Of Epstein Files EXPLAINED

$50 Million Gone in Seconds… From One Tiny Mistake

NY AG Indicted For FRAUD, Faces 30 Years In Prison, $1 MILLION FINE | Timcast IRL

Ivan the Terrible and his Son Ivan by Ilya Repin: Great Art Explained

Mysterious Craters in Siberia May Be Finally Explained

Description

Top Comments (10)

Unlock the Data Inside
Turn Videos into Knowledge

1 Million Tiny Experts in an AI? Fine-Grained MoE Explained

Description

Top Comments (10)

Related videos

FDE: The $1M/Year AI Job Explained

DSpark: DeepSeek-V4's Insane Compute Optimization Explained

Firecrawl AI clearly explained (and how to make $$)

DeepSeek's Insane Architecture Breakthrough [Engram Explained]

An Expert Exploration Into St. Thomas Aquinas | Fr. Gregory Pine

The Three Million NEW Pages Of Epstein Files EXPLAINED

$50 Million Gone in Seconds… From One Tiny Mistake

NY AG Indicted For FRAUD, Faces 30 Years In Prison, $1 MILLION FINE | Timcast IRL

Ivan the Terrible and his Son Ivan by Ilya Repin: Great Art Explained

Mysterious Craters in Siberia May Be Finally Explained

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Related videos

FDE: The $1M/Year AI Job Explained

DSpark: DeepSeek-V4's Insane Compute Optimization Explained

Firecrawl AI clearly explained (and how to make $$)

DeepSeek's Insane Architecture Breakthrough [Engram Explained]

An Expert Exploration Into St. Thomas Aquinas | Fr. Gregory Pine

The Three Million NEW Pages Of Epstein Files EXPLAINED

$50 Million Gone in Seconds… From One Tiny Mistake

NY AG Indicted For FRAUD, Faces 30 Years In Prison, $1 MILLION FINE | Timcast IRL

Ivan the Terrible and his Son Ivan by Ilya Repin: Great Art Explained

Mysterious Craters in Siberia May Be Finally Explained

Description

Top Comments (10)

Unlock the Data Inside Turn Videos into Knowledge

Unlock the Data Inside
Turn Videos into Knowledge