1-Bit LLM: The Most Efficient LLM Possible?

2025-06-18 Science & Technology

349.0k

16.2k

724

Watch on YouTube

bycloud

229.0k subscribers

Description

Download Tanka today https://www.tanka.ai and enjoy 3 months of free Premium! You can also get $20 / team for each referrals I've been planning for a bitnet video for the longest time, and with the release of bitnet b1.58 2B4T gave me the perfect chance to brief you on the history of 1-bit LLM! Fun fact, the major bitnet research is mostly done by the same researchers. My Newsletter https://mail.bycloud.ai/ my project: find, discover & explain AI research semantically https://findmypapers.ai/ My Patreon https://www.patreon.com/c/bycloud Quantifying the Capabilities of LLMs across Scale and Precision [Paper] https://arxiv.org/abs/2405.03146v2 BitNet: Scaling 1-bit Transformers for Large Language Models [Paper] https://arxiv.org/abs/2310.11453v1 The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits [Paper] https://arxiv.org/abs/2402.17764v1 BitNet a4.8: 4-bit Activations for 1-bit LLMs [Paper] https://arxiv.org/abs/2411.04965v1 Efficient Construction of Model Family through Progressive Training Using Model Expansion [Paper] https://arxiv.org/abs/2504.00623v1 BitNet b1.58 2B4T Technical Report [Paper] https://arxiv.org/abs/2504.12285 [Web Demo] https://bitnet-demo.azurewebsites.net/ [HuggingFace] https://huggingface.co/microsoft/bitnet-b1.58-2B-4T [Code] https://github.com/microsoft/BitNet [Additional Recs] T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge https://arxiv.org/abs/2407.00088v2 FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation https://arxiv.org/abs/2407.07093v1 Matmul or No Matmul in the Era of 1-bit LLMs https://arxiv.org/abs/2408.11939v2 1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs https://arxiv.org/abs/2410.16144v2 Bitnet.cpp: Efficient Edge Inference for Ternary LLMs https://arxiv.org/abs/2502.11880v1 Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models? https://arxiv.org/abs/2502.11895v1 (NEW!) BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs https://arxiv.org/abs/2504.18415 (NEW!) BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation https://arxiv.org/abs/2506.07530 Try out my new fav place to learn how to code https://scrimba.com/?via=bycloudAI This video is supported by the kind Patrons & YouTube Members: 🙏Nous Research, Chris LeDoux, Ben Shaener, DX Research Group, Poof N' Inu, Andrew Lescelius, Deagan, Robert Zawiasa, Ryszard Warzocha, Tobe2d, Louis Muk, Akkusativ, Kevin Tai, Mark Buckler, NO U, Tony Jimenez, Ângelo Fonseca, jiye, Anushka, Asad Dhamani, Binnie Yiu, Calvin Yan, Clayton Ford, Diego Silva, Etrotta, Gonzalo Fidalgo, Handenon, Hector, Jake Disco very, Michael Brenner, Nilly K, OlegWock, Daddy Wen, Shuhong Chen, Sid_Cipher, Stefan Lorenz, Sup, tantan assawade, Thipok Tham, Thomas Di Martino, Thomas Lin, Richárd Nagyfi, Paperboy, mika, Leo, Berhane-Meskel, Kadhai Pesalam, mayssam, Bill Mangrum, nyaa [Discord] https://discord.gg/NhJZGtH [Twitter] https://twitter.com/bycloudai [Patreon] https://www.patreon.com/bycloud [Business Inquiries] [email protected] [Profile & Banner Art] https://twitter.com/pygm7 [Video Editor] Abhay [Ko-fi] https://ko-fi.com/bycloudai

#bycloud #bycloudai #1 bit llm #bitnet #bitnet explained #bitnet b1.58 2b4t #bitnet model #bitnet llm

Top Comments (10)

@steffenaltmeier6602 2025-06-18

i really hope that bitnets become the future. the sheer efficiency gains and possiblity to avoid multiplications could be truly insane for everyone trying to run models locally.

1.9k 35 replies

@Ewoof 2025-06-18

I attempted to replicate their paper on a smaller scale, and what I discovered is that the 1 or 1.58-bit LLM itself is really nothing by themselves. They still require full precision weights during training and act like any other llm. What sets it apart, however, is the level of optimization it enables that simply aren’t feasible with 4-bit or 8-bit models. The challenge is that these optimizations require custom kernel modifications, as mainstream frameworks like PyTorch and TensorFlow don’t natively support them. There are currently no widely available frameworks that fully exploit the benefits of 1-bit quantization, since existing tools heavily prioritize GPUs over CPUs. As a result, unless these implementations are incorporated into standard libraries, it's nearly impossible to fully leverage 1-bit LLMs unless you use BitNet’s version, which, while powerful, is notoriously difficult to set up properly due to its extensive dependencies.

831 32 replies

@el_saltamontes 2025-06-21

I don't think I've ever seen more sponsored ads in a YouTube video

578 18 replies

@primee_lion 2025-06-18

8:03 it runs at 1.67x faster, not 66x faster

564 14 replies

@ParitoshTripathiOfficial 2025-06-20

SponsorBlock was invented for such videos

466 4 replies

@cbuchner1 2025-06-18

My understanding is that training a bitnet still requires a full precision set of weights for the gradient descent to work. So I doubt the claimed 20x energy savings during training.

297 4 replies

@samehedi 2025-06-18

uuhhh, that video is nice! first time i ran a 103b in 1bit, my mind was completely blown. it worked and my IT-brain could not comprehend it. i still cant. it should not work, but it clearly does. mind completely blown. and going from matrix multiplication to addition... damn, never thought about that. that's big news! 😲

155 20 replies

@cagedgandalf3472 2025-06-19

I am more of an embedded systems/robotics type of engineer and this is amazing news. Imagine what kind of complex AI you can fit in your Arduino. I still remember when I had memory problems in Arduino as a high school student trying to store audio. Now, the same student could use AI.

94 6 replies

@bycloudAI 2025-06-18

Download Tanka today https://www.tanka.ai and enjoy 3 months of free Premium! You can also get $20 / team for each referrals

73 6 replies

@giorgos1794 2025-06-25

6 ads-plugs in a 14 minute video. Thats really great!

Description

Top Comments (10)

@steffenaltmeier6602 2025-06-18

i really hope that bitnets become the future. the sheer efficiency gains and possiblity to avoid multiplications could be truly insane for everyone trying to run models locally.

1.9k 35 replies

@Ewoof 2025-06-18

831 32 replies

@el_saltamontes 2025-06-21

I don't think I've ever seen more sponsored ads in a YouTube video

578 18 replies

@primee_lion 2025-06-18

8:03 it runs at 1.67x faster, not 66x faster

564 14 replies

@ParitoshTripathiOfficial 2025-06-20

SponsorBlock was invented for such videos

466 4 replies

@cbuchner1 2025-06-18

My understanding is that training a bitnet still requires a full precision set of weights for the gradient descent to work. So I doubt the claimed 20x energy savings during training.

297 4 replies

@samehedi 2025-06-18

155 20 replies

@cagedgandalf3472 2025-06-19

94 6 replies

@bycloudAI 2025-06-18

Download Tanka today https://www.tanka.ai and enjoy 3 months of free Premium! You can also get $20 / team for each referrals

73 6 replies

@giorgos1794 2025-06-25

6 ads-plugs in a 14 minute video. Thats really great!

Unlock the Data Inside
Turn Videos into Knowledge

Get FREE 10/day: transcripts, summaries, chats
Chat with videos, export text & PDF
$1 free API credit for RAG, chatbots & research

Try it free

Free forever plan • All features unlocked

1-Bit LLM: The Most Efficient LLM Possible?

Description

Top Comments (10)

Related videos

NO WAY THIS IS POSSIBLE

The #1 Most Important Brain Nutrient EVERYONE Is Deficient In

THIS CAN NOT BE POSSIBLE

The Most Clever Trick To Speedup LLMs

Why can’t LLMs just LEARN the context window?

The Death of RAG?

Film Theory: Who is the MOST Invincible?

LLM’s Billion Dollar Problem

The RL Irony in LLMs

NO ONE saw the 49ers coming as possible No. 1 seed 👀 Bills the lone Must-Win squad | FTF

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Related videos

NO WAY THIS IS POSSIBLE

The #1 Most Important Brain Nutrient EVERYONE Is Deficient In

THIS CAN NOT BE POSSIBLE

The Most Clever Trick To Speedup LLMs

Why can’t LLMs just LEARN the context window?

The Death of RAG?

Film Theory: Who is the MOST Invincible?

LLM’s Billion Dollar Problem

The RL Irony in LLMs

NO ONE saw the 49ers coming as possible No. 1 seed 👀 Bills the lone Must-Win squad | FTF

Description

Top Comments (10)

Unlock the Data Inside
Turn Videos into Knowledge

1-Bit LLM: The Most Efficient LLM Possible?

Description

Top Comments (10)

Related videos

NO WAY THIS IS POSSIBLE

The #1 Most Important Brain Nutrient EVERYONE Is Deficient In

THIS CAN NOT BE POSSIBLE

The Most Clever Trick To Speedup LLMs

Why can’t LLMs just LEARN the context window?

The Death of RAG?

Film Theory: Who is the MOST Invincible?

LLM’s Billion Dollar Problem

The RL Irony in LLMs

NO ONE saw the 49ers coming as possible No. 1 seed 👀 Bills the lone Must-Win squad | FTF

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Related videos

NO WAY THIS IS POSSIBLE

The #1 Most Important Brain Nutrient EVERYONE Is Deficient In

THIS CAN NOT BE POSSIBLE

The Most Clever Trick To Speedup LLMs

Why can’t LLMs just LEARN the context window?

The Death of RAG?

Film Theory: Who is the MOST Invincible?

LLM’s Billion Dollar Problem

The RL Irony in LLMs

NO ONE saw the 49ers coming as possible No. 1 seed 👀 Bills the lone Must-Win squad | FTF

Description

Top Comments (10)

Unlock the Data Inside Turn Videos into Knowledge

Unlock the Data Inside
Turn Videos into Knowledge