Navigate Select ESC Close

The REAL AI Architecture That Unifies Vision & Language

2025-06-16 Science & Technology
44.4k
2.0k
115
bycloud
bycloud
225.0k subscribers

Unlock all features

FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.

Description

Get started now with open source & privacy focused password manager by Proton! https://proton.me/pass/bycloudai In this video, we are going to dive into native MoE multimodal LLMs, the difference between early fusion and late fusion, and why a unified model is the way forward. My Newsletter https://mail.bycloud.ai/ my project: find, discover & explain AI research semantically https://findmypapers.ai/ My Patreon https://www.patreon.com/c/bycloud paper sauces Chameleon: Mixed-Modal Early-Fusion Foundation Models [Paper] https://arxiv.org/abs/2405.09818 Scaling Laws for Native Multimodal Models [Paper] https://www.arxiv.org/abs/2504.07951 Scaling Pre-training to One Hundred Billion Data for Vision Language Models [Paper] https://arxiv.org/abs/2502.07617 Try out my new fav place to learn how to code https://scrimba.com/?via=bycloudAI This video is supported by the kind Patrons & YouTube Members: 🙏Nous Research, Chris LeDoux, Ben Shaener, DX Research Group, Poof N' Inu, Andrew Lescelius, Deagan, Robert Zawiasa, Ryszard Warzocha, Tobe2d, Louis Muk, Akkusativ, Kevin Tai, Mark Buckler, NO U, Tony Jimenez, Ângelo Fonseca, jiye, Anushka, Asad Dhamani, Binnie Yiu, Calvin Yan, Clayton Ford, Diego Silva, Etrotta, Gonzalo Fidalgo, Handenon, Hector, Jake Disco very, Michael Brenner, Nilly K, OlegWock, Daddy Wen, Shuhong Chen, Sid_Cipher, Stefan Lorenz, Sup, tantan assawade, Thipok Tham, Thomas Di Martino, Thomas Lin, Richárd Nagyfi, Paperboy, mika, Leo, Berhane-Meskel, Kadhai Pesalam, mayssam, Bill Mangrum, nyaa [Discord] https://discord.gg/NhJZGtH [Twitter] https://twitter.com/bycloudai [Patreon] https://www.patreon.com/bycloud [Business Inquiries] [email protected] [Profile & Banner Art] https://twitter.com/pygm7 [Video Editor] @Booga04 [Bitcoin (BTC)] 3JFMJQVGXNA2HJE5V9qCwLiqy6wHY9Vhdx [Ethereum (ETH)] 0x3d784F55E0bE5f35c1566B2E014598C0f354f190 [Litecoin (LTC)] MGHnqALjyU2W6NuJSSW9fTWV4dcHfwHZd7 [Bitcoin Cash (BCH)] 1LkyGfzHxnSfqMF8tN7ZGDwUTyBB6vcii9 [Solana (SOL)] 6XyMCEdVhtxJQRjMKgUJaySL8cGoBPzzA2NPDMPfVkKN [Ko-fi] https://ko-fi.com/bycloudai

Top Comments (10)

@fhub29 2025-06-16

almost as fast as the bots

92
@pancake-g1u 2025-06-16

Uploaded three monutes ago and there are already bots here. Wtf is happening with YouTube 😭

30 3 replies
@kodirovsshik 2025-06-17

6:31 I love the random memes all throughout the video

16
@bycloudAI 2025-06-16

Get started now with open source & privacy focused password manager by Proton! https://proton.me/pass/bycloudai

8 1 replies
@generalawareness101 2025-06-16

You thumbnail test like a beast.

8
@Fish-8332 2025-06-17

Bytedance’s sail came to the same conclusion. But I am very skeptical of both, we need to find better compressions for tokenizers. We need to compress redundant frames and focus on important frames.

6
@YT7mc 2025-06-16

8:49 because of how much knowledge images can help unlock knowledge

5
@theappleman005 2025-06-25

It's just fun to hear you say "Chris Ledouuuuux"

2
@minecraftermad 2025-06-19

llama 5 looks like it'll be nuts if they implement all these papers into one model.

0
@VSS63 2025-06-18

Great video, thank you for your time

0

Unlock the Data Inside
Turn Videos into Knowledge

  • Get FREE 10/day: transcripts, summaries, chats
  • Chat with videos, export text & PDF
  • $1 free API credit for RAG, chatbots & research

Free forever plan • All features unlocked

App screenshot