Navigate Select ESC Close

I built an AI Supercomputer... again (2TB RAM)

2025-12-20 Science & Technology
111.1k
5.4k
742
NetworkChuck
NetworkChuck
5.3m subscribers

Enabling High-Speed Local AI Clustering on Mac with Apple's RDMA Update

Discover how Apple's software update—enabling low-latency RDMA over Thunderbolt 5—revolutionized local AI clustering, turning slow performance from 5 tokens/sec to over 16 tokens/sec using massive models.

Short Summary

  • Apple introduced RDMA (Remote Direct Memory Access) via a software update (Tahoe 26.2).
  • This feature slashed inter-device latency from 300 microseconds down to 3 microseconds.
  • The improved connectivity unlocked Tensor Parallelism, making massive model clustering fast and viable.
  • Testing showed clustering now speeds up inference (e.g., Llama 3 370B went from 5 to 16 tokens/sec).

This breakdown covers the construction of a powerful, $50,000 local AI cluster using four highly-specced Mac Studios. The core focus is evaluating whether clustering these machines makes sense for AI workloads, especially after previous failures due to networking bottlenecks. The key takeaway is that Apple solved connectivity latency using RDMA, fundamentally changing the equation for decentralized local AI processing.

Unlock all features

FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.

Description

Hey…just try Twingate….you'll never look at VPN the same: https://ntck.co/twingate-networkchuck I built another AI supercomputer with 4 Mac Studios... but this time it actually works. Earlier this year, I clustered 5 Mac Studios and it was 91% SLOWER. Everyone said clustering was stupid. But Apple just dropped a software update that changes everything - RDMA over Thunderbolt 5. Latency dropped from 300 microseconds to 3 microseconds. Now we're running trillion-parameter models locally at speeds that actually make sense. 🔥🔥Join the NetworkChuck Academy!: https://ntck.co/NCAcademy RESOURCES / LINKS: Docs/walkthrough: https://github.com/theNetworkChuck/mac-studio-cluster Exo Labs: https://github.com/exo-explore/exo MLX (Apple's ML Framework): https://github.com/ml-explore/mlx My First Cluster Video (the failure): https://youtu.be/Ju0ndy2kwlw RDMA Networking Explained: https://youtu.be/fb69FyW2KLk TIMESTAMPS: 0:00 - The $50,000 AI Supercomputer 0:53 - What Apple Changed 3:05 - Connecting the Cluster 4:17 - Pipeline vs Tensor Parallelism 7:52 - RDMA: The 100x Latency Fix 10:02 - Twingate (Sponsor) 11:39 - Exo Labs is BACK 14:42 - Single Node vs Cluster Testing 17:58 - Qwen 3 Coder 480B Testing 19:03 - Kimi K2 (1 Trillion Parameters) 21:09 - Stacking Multiple Models 25:22 - Real Apps: Open WebUI + Xcode 27:57 - Final Thoughts 28:47 - How MLX Makes This Possible **Sponsored by Twingate THE SPECS: • 4x Mac Studio M4 Ultra (512GB RAM each) • 2TB unified memory / 320 GPU cores / 32TB storage • $50,000 (vs $780,000+ for equivalent NVIDIA H100s) THE RESULTS: • Llama 3.3 70B: 16 tok/s (3x faster than before) • Kimi K2 (1T params): 28 tok/s • DeepSeek V3.1 671B: 27 tok/s • Qwen 3 Coder 480B: 40 tok/s SUPPORT NETWORKCHUCK --------------------------------------------------- 🎓🎓 Sign up for NetworkChuck Academy: https://ntck.co/NCAcademy ☕☕ COFFEE and MERCH: https://ntck.co/coffee 🌐🌐 Use the MOST SECURE Web Browser, NetworkChuck Cloud Browser: https://browser.networkchuck.com/ 🧠🧠 Use n8n, my favorite automation tool: https://ntck.co/n8n 🆘🆘 NEED HELP?? Join the Discord Server: https://discord.gg/networkchuck STUDY WITH ME on Twitch: https://bit.ly/nc_twitch READY TO LEARN?? --------------------------------------------------- -Sign up for NetworkChuck Academy: https://ntck.co/NCAcademy -Get your CCNA: https://bit.ly/nc-ccna FOLLOW ME EVERYWHERE --------------------------------------------------- Instagram: https://www.instagram.com/networkchuck/ Twitter: https://twitter.com/networkchuck Facebook: https://www.facebook.com/NetworkChuck/ Join the Discord server: http://bit.ly/nc-discord Do you want to know how I draw on the screen?? Go to https://ntck.co/EpicPen and use code NetworkChuck to get 20% off!! clustering works now. thank Apple and Exo Labs. # # # TAGS: mac studio cluster, ai supercomputer, local ai, rdma, exo labs, apple silicon, m4 ultra, unified memory, tensor parallelism, llm, kimi k2, deepseek, llama, mlx, thunderbolt 5, home lab ai, self hosted ai, 2tb ram, gpu cluster, apple ai

Top Comments (10)

@Klondike_Bar28 2025-12-20

So many YouTubers got sent 4 Mac’s this week

2.0k 102 replies
@userinit5064 2025-12-20

This guy is the reason why we're in a RAM shortage lol

1.4k 26 replies
@drcrankenstein 2025-12-20

Chuck's Wife: "Stop hugging your computers, and come to dinner!"

371 5 replies
@powerxcode5333 2025-12-22

He has RAM get him!!!

347 5 replies
@Christiskingjesusislord00 2025-12-20

That's at least 5 cents extra on my future RAM purchase.

289 7 replies
@NJPL7836 2025-12-20

Show generative audio, images and video to see the efficiency of the cluster.

182 7 replies
@MyDigitalHub 2026-01-14

Dude is going on a RAMpage

138 2 replies
@NetworkChuck 2025-12-20

Hey…just try Twingate….you'll never look at VPN the same: https://ntck.co/twingate-networkchuck I built another AI supercomputer with 4 Mac Studios... but this time it actually works. Earlier this year, I clustered 5 Mac Studios and it was 91% SLOWER. Everyone said clustering was stupid. But Apple just dropped a software update that changes everything - RDMA over Thunderbolt 5. Latency dropped from 300 microseconds to 3 microseconds. Now we're running trillion-parameter models locally at speeds that actually make sense. 🔥🔥Join the NetworkChuck Academy!: https://ntck.co/NCAcademy RESOURCES / LINKS: Docs/walkthrough: https://github.com/theNetworkChuck/mac-studio-cluster Exo Labs: https://github.com/exo-explore/exo MLX (Apple's ML Framework): https://github.com/ml-explore/mlx My First Cluster Video (the failure): https://youtu.be/Ju0ndy2kwlw RDMA Networking Explained: https://youtu.be/fb69FyW2KLk

121 40 replies
@ImYouKnowWho 2026-01-10

The end was really kind. Thank you

85 2 replies
@jupiterisalegend 2026-01-19

If he gets hold of one more ram stick Im gonna litteraly crash out

5

Unlock the Data Inside
Turn Videos into Knowledge

  • Get FREE 10/day: transcripts, summaries, chats
  • Chat with videos, export text & PDF
  • $1 free API credit for RAG, chatbots & research

Free forever plan • All features unlocked

App screenshot