The REAL AI Architecture That Unifies Vision & Language

2025-06-16 Science & Technology

44.4k

2.0k

115

Watch on YouTube

bycloud

229.0k subscribers

Description

Get started now with open source & privacy focused password manager by Proton! https://proton.me/pass/bycloudai In this video, we are going to dive into native MoE multimodal LLMs, the difference between early fusion and late fusion, and why a unified model is the way forward. My Newsletter https://mail.bycloud.ai/ my project: find, discover & explain AI research semantically https://findmypapers.ai/ My Patreon https://www.patreon.com/c/bycloud paper sauces Chameleon: Mixed-Modal Early-Fusion Foundation Models [Paper] https://arxiv.org/abs/2405.09818 Scaling Laws for Native Multimodal Models [Paper] https://www.arxiv.org/abs/2504.07951 Scaling Pre-training to One Hundred Billion Data for Vision Language Models [Paper] https://arxiv.org/abs/2502.07617 Try out my new fav place to learn how to code https://scrimba.com/?via=bycloudAI This video is supported by the kind Patrons & YouTube Members: 🙏Nous Research, Chris LeDoux, Ben Shaener, DX Research Group, Poof N' Inu, Andrew Lescelius, Deagan, Robert Zawiasa, Ryszard Warzocha, Tobe2d, Louis Muk, Akkusativ, Kevin Tai, Mark Buckler, NO U, Tony Jimenez, Ângelo Fonseca, jiye, Anushka, Asad Dhamani, Binnie Yiu, Calvin Yan, Clayton Ford, Diego Silva, Etrotta, Gonzalo Fidalgo, Handenon, Hector, Jake Disco very, Michael Brenner, Nilly K, OlegWock, Daddy Wen, Shuhong Chen, Sid_Cipher, Stefan Lorenz, Sup, tantan assawade, Thipok Tham, Thomas Di Martino, Thomas Lin, Richárd Nagyfi, Paperboy, mika, Leo, Berhane-Meskel, Kadhai Pesalam, mayssam, Bill Mangrum, nyaa [Discord] https://discord.gg/NhJZGtH [Twitter] https://twitter.com/bycloudai [Patreon] https://www.patreon.com/bycloud [Business Inquiries] [email protected] [Profile & Banner Art] https://twitter.com/pygm7 [Video Editor] @Booga04 [Bitcoin (BTC)] 3JFMJQVGXNA2HJE5V9qCwLiqy6wHY9Vhdx [Ethereum (ETH)] 0x3d784F55E0bE5f35c1566B2E014598C0f354f190 [Litecoin (LTC)] MGHnqALjyU2W6NuJSSW9fTWV4dcHfwHZd7 [Bitcoin Cash (BCH)] 1LkyGfzHxnSfqMF8tN7ZGDwUTyBB6vcii9 [Solana (SOL)] 6XyMCEdVhtxJQRjMKgUJaySL8cGoBPzzA2NPDMPfVkKN [Ko-fi] https://ko-fi.com/bycloudai

#bycloud #bycloudai #native multimodal llm #llm #native moe llm #Native Multimodal Models #Early-Fusion #late fusion

Top Comments (10)

@fhub29 2025-06-16

almost as fast as the bots

@pancake-g1u 2025-06-16

Uploaded three monutes ago and there are already bots here. Wtf is happening with YouTube 😭

30 3 replies

@kodirovsshik 2025-06-17

6:31 I love the random memes all throughout the video

@bycloudAI 2025-06-16

Get started now with open source & privacy focused password manager by Proton! https://proton.me/pass/bycloudai

8 1 replies

@Fish-8332 2025-06-17

Bytedance’s sail came to the same conclusion. But I am very skeptical of both, we need to find better compressions for tokenizers. We need to compress redundant frames and focus on important frames.

@YT7mc 2025-06-16

8:49 because of how much knowledge images can help unlock knowledge

@theappleman005 2025-06-25

It's just fun to hear you say "Chris Ledouuuuux"

@minecraftermad 2025-06-19

llama 5 looks like it'll be nuts if they implement all these papers into one model.

@VSS63 2025-06-18

Great video, thank you for your time

@MikeNugget 2025-06-17

All right, I think I get it, multimodalmodel specialist specialisation 😅

Description

Top Comments (10)

@fhub29 2025-06-16

almost as fast as the bots

@pancake-g1u 2025-06-16

Uploaded three monutes ago and there are already bots here. Wtf is happening with YouTube 😭

30 3 replies

@kodirovsshik 2025-06-17

6:31 I love the random memes all throughout the video

@bycloudAI 2025-06-16

Get started now with open source & privacy focused password manager by Proton! https://proton.me/pass/bycloudai

8 1 replies

@Fish-8332 2025-06-17

@YT7mc 2025-06-16

8:49 because of how much knowledge images can help unlock knowledge

@theappleman005 2025-06-25

It's just fun to hear you say "Chris Ledouuuuux"

@minecraftermad 2025-06-19

llama 5 looks like it'll be nuts if they implement all these papers into one model.

@VSS63 2025-06-18

Great video, thank you for your time

@MikeNugget 2025-06-17

All right, I think I get it, multimodalmodel specialist specialisation 😅

Unlock the Data Inside
Turn Videos into Knowledge

Get FREE 10/day: transcripts, summaries, chats
Chat with videos, export text & PDF
$1 free API credit for RAG, chatbots & research

Try it free

Free forever plan • All features unlocked

The REAL AI Architecture That Unifies Vision & Language

Description

Top Comments (10)

Related videos

Secret 1944-1954 Architecture That Still Rules Us Today w/ Mel K (Live)

DeepSeek's Insane Architecture Breakthrough [Engram Explained]

The Death of RAG?

The Map of Reality: 11 Levels That Make Up the Universe

The Architecture of Dominion (authority, altars, & supervision) w/Dr. Francis Myles

What’s the best programming language for AI?

The RL Irony in LLMs

Chinese 14nm Chips Beat NVIDIA 4nm Silicon - China AI Architecture Superior to USA Tech

The Chinese AI Iceberg

Huawei New OceanDisk AI Storage -- China Beating USA Sanctions with Systems Architecture

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Related videos

Secret 1944-1954 Architecture That Still Rules Us Today w/ Mel K (Live)

DeepSeek's Insane Architecture Breakthrough [Engram Explained]

The Death of RAG?

The Map of Reality: 11 Levels That Make Up the Universe

The Architecture of Dominion (authority, altars, & supervision) w/Dr. Francis Myles

What’s the best programming language for AI?

The RL Irony in LLMs

Chinese 14nm Chips Beat NVIDIA 4nm Silicon - China AI Architecture Superior to USA Tech

The Chinese AI Iceberg

Huawei New OceanDisk AI Storage -- China Beating USA Sanctions with Systems Architecture

Description

Top Comments (10)

Unlock the Data Inside
Turn Videos into Knowledge

The REAL AI Architecture That Unifies Vision & Language

Description

Top Comments (10)

Related videos

Secret 1944-1954 Architecture That Still Rules Us Today w/ Mel K (Live)

DeepSeek's Insane Architecture Breakthrough [Engram Explained]

The Death of RAG?

The Map of Reality: 11 Levels That Make Up the Universe

The Architecture of Dominion (authority, altars, & supervision) w/Dr. Francis Myles

What’s the best programming language for AI?

The RL Irony in LLMs

Chinese 14nm Chips Beat NVIDIA 4nm Silicon - China AI Architecture Superior to USA Tech

The Chinese AI Iceberg

Huawei New OceanDisk AI Storage -- China Beating USA Sanctions with Systems Architecture

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Related videos

Secret 1944-1954 Architecture That Still Rules Us Today w/ Mel K (Live)

DeepSeek's Insane Architecture Breakthrough [Engram Explained]

The Death of RAG?

The Map of Reality: 11 Levels That Make Up the Universe

The Architecture of Dominion (authority, altars, & supervision) w/Dr. Francis Myles

What’s the best programming language for AI?

The RL Irony in LLMs

Chinese 14nm Chips Beat NVIDIA 4nm Silicon - China AI Architecture Superior to USA Tech

The Chinese AI Iceberg

Huawei New OceanDisk AI Storage -- China Beating USA Sanctions with Systems Architecture

Description

Top Comments (10)

Unlock the Data Inside Turn Videos into Knowledge

Unlock the Data Inside
Turn Videos into Knowledge