Navigate Select ESC Close

1 Million Tiny Experts in an AI? Fine-Grained MoE Explained

2024-07-31 Science & Technology
54.6k
2.9k
168
bycloud
bycloud
225.0k subscribers

Unlock all features

FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.

Description

To try everything Brilliant has to offer—free—for a full 30 days, visit https://brilliant.org/bycloud/ . You’ll also get 20% off an annual premium subscription. Mixture of Experts explained, well, re-explained. We are in the Fine-Grain era of Mixture of Experts and it's about to get even more interesting as we further scale it up. This video was sponsored by Brilliant Check out my newsletter: https://mail.bycloud.ai Special thanks to LDJ for helping me with this video Mixtral 8x7B Paper [Paper] https://arxiv.org/abs/2401.04088 Sparse MoE (2017) [Paper] https://arxiv.org/abs/1701.06538 Adaptive Mixtures of Local Experts (1991) [Paper] https://direct.mit.edu/neco/article-abstract/3/1/79/5560/Adaptive-Mixtures-of-Local-Experts?redirectedFrom=fulltext Gshard [Paper] https://arxiv.org/pdf/2006.16668 Branch-Train Mix [Paper] https://arxiv.org/pdf/2403.07816 DeepSeek-MoE [Paper] https://arxiv.org/abs/2401.06066 MoWE (from the meme at 7:51) [Paper] https://arxiv.org/abs/2311.10768 Mixture of A Million Experts [Paper] https://web3.arxiv.org/abs/2407.04153 This video is supported by the kind Patrons & YouTube Members: 🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Robert Zawiasa, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi, Hector, Drexon, Claxvii 177th, Inferencer, Michael Brenner, Akkusativ, Oleg Wock, FantomBloth [Discord] https://discord.gg/NhJZGtH [Twitter] https://twitter.com/bycloudai [Patreon] https://www.patreon.com/bycloud [Music] massobeats - daydream [Profile & Banner Art] https://twitter.com/pygm7 [Video Editor] @Askejm

Top Comments (10)

@pro100gameryt8 2024-07-31

Imagine assembling 1 milliont PhD students together to discuss someone's request like "write a poem about cooking eggs with c++". Thats MoE irl

411 17 replies
@gemstone7818 2024-07-31

to some extent this seems closer to how brains work

160 10 replies
@randomlettersqzkebkw 2024-07-31

i see what you did there with "catastrophic forgetting" lmao 🤣

60 1 replies
@GeoMeridium 2024-08-02

It's crazy how Meta's 8B parameter Llama 3 model has nearly the same performance as the original GPT-4 with 1.8T parameters. That's a 225x reduction in compute in just 2 years.

52 1 replies
@sorakagodess 2024-08-01

The only thing in my mind is "MoE moe kyuuuuun!!!"

50 1 replies
@Quantum_Nebula 2024-08-01

Now I really am excited for a 800B model with fine-grained MoE to surface that I can run on basically any device.

16 1 replies
@bycloudAI 2024-07-31

To try everything Brilliant has to offer—free—for a full 30 days, visit https://brilliant.org/bycloud/ . You’ll also get 20% off an annual premium subscription! Like this comment if you wanna see more MoE related content, I have quite a good list for a video;)

11 2 replies
@AkysChannel 2024-08-01

These videos format is GOLD 🏆 such specific and nerdy topics produced as memes 😄

9
@lazyalpaca7 2024-07-31

3:37 wasn't it just yesterday that they released their model 😭

8
@Limofeus 2024-08-02

I'd imagine in a month someone will come with MoE responsible for choosing the best MoE to choose the best MoE out of billions of experts

2

Unlock the Data Inside
Turn Videos into Knowledge

  • Get FREE 10/day: transcripts, summaries, chats
  • Chat with videos, export text & PDF
  • $1 free API credit for RAG, chatbots & research

Free forever plan • All features unlocked

App screenshot