Navigate Select ESC Close

1-Bit LLM: The Most Efficient LLM Possible?

2025-06-18 Science & Technology
349.0k
16.2k
724
bycloud
bycloud
225.0k subscribers

Unlock all features

FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.

Description

Download Tanka today https://www.tanka.ai and enjoy 3 months of free Premium! You can also get $20 / team for each referrals I've been planning for a bitnet video for the longest time, and with the release of bitnet b1.58 2B4T gave me the perfect chance to brief you on the history of 1-bit LLM! Fun fact, the major bitnet research is mostly done by the same researchers. My Newsletter https://mail.bycloud.ai/ my project: find, discover & explain AI research semantically https://findmypapers.ai/ My Patreon https://www.patreon.com/c/bycloud Quantifying the Capabilities of LLMs across Scale and Precision [Paper] https://arxiv.org/abs/2405.03146v2 BitNet: Scaling 1-bit Transformers for Large Language Models [Paper] https://arxiv.org/abs/2310.11453v1 The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits [Paper] https://arxiv.org/abs/2402.17764v1 BitNet a4.8: 4-bit Activations for 1-bit LLMs [Paper] https://arxiv.org/abs/2411.04965v1 Efficient Construction of Model Family through Progressive Training Using Model Expansion [Paper] https://arxiv.org/abs/2504.00623v1 BitNet b1.58 2B4T Technical Report [Paper] https://arxiv.org/abs/2504.12285 [Web Demo] https://bitnet-demo.azurewebsites.net/ [HuggingFace] https://huggingface.co/microsoft/bitnet-b1.58-2B-4T [Code] https://github.com/microsoft/BitNet [Additional Recs] T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge https://arxiv.org/abs/2407.00088v2 FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation https://arxiv.org/abs/2407.07093v1 Matmul or No Matmul in the Era of 1-bit LLMs https://arxiv.org/abs/2408.11939v2 1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs https://arxiv.org/abs/2410.16144v2 Bitnet.cpp: Efficient Edge Inference for Ternary LLMs https://arxiv.org/abs/2502.11880v1 Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models? https://arxiv.org/abs/2502.11895v1 (NEW!) BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs https://arxiv.org/abs/2504.18415 (NEW!) BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation https://arxiv.org/abs/2506.07530 Try out my new fav place to learn how to code https://scrimba.com/?via=bycloudAI This video is supported by the kind Patrons & YouTube Members: 🙏Nous Research, Chris LeDoux, Ben Shaener, DX Research Group, Poof N' Inu, Andrew Lescelius, Deagan, Robert Zawiasa, Ryszard Warzocha, Tobe2d, Louis Muk, Akkusativ, Kevin Tai, Mark Buckler, NO U, Tony Jimenez, Ângelo Fonseca, jiye, Anushka, Asad Dhamani, Binnie Yiu, Calvin Yan, Clayton Ford, Diego Silva, Etrotta, Gonzalo Fidalgo, Handenon, Hector, Jake Disco very, Michael Brenner, Nilly K, OlegWock, Daddy Wen, Shuhong Chen, Sid_Cipher, Stefan Lorenz, Sup, tantan assawade, Thipok Tham, Thomas Di Martino, Thomas Lin, Richárd Nagyfi, Paperboy, mika, Leo, Berhane-Meskel, Kadhai Pesalam, mayssam, Bill Mangrum, nyaa [Discord] https://discord.gg/NhJZGtH [Twitter] https://twitter.com/bycloudai [Patreon] https://www.patreon.com/bycloud [Business Inquiries] [email protected] [Profile & Banner Art] https://twitter.com/pygm7 [Video Editor] Abhay [Ko-fi] https://ko-fi.com/bycloudai

Top Comments (10)

@Ewoof 2025-06-18

I attempted to replicate their paper on a smaller scale, and what I discovered is that the 1 or 1.58-bit LLM itself is really nothing by themselves. They still require full precision weights during training and act like any other llm. What sets it apart, however, is the level of optimization it enables that simply aren’t feasible with 4-bit or 8-bit models. The challenge is that these optimizations require custom kernel modifications, as mainstream frameworks like PyTorch and TensorFlow don’t natively support them. There are currently no widely available frameworks that fully exploit the benefits of 1-bit quantization, since existing tools heavily prioritize GPUs over CPUs. As a result, unless these implementations are incorporated into standard libraries, it's nearly impossible to fully leverage 1-bit LLMs unless you use BitNet’s version, which, while powerful, is notoriously difficult to set up properly due to its extensive dependencies.

818 32 replies
@el_saltamontes 2025-06-21

I don't think I've ever seen more sponsored ads in a YouTube video

554 18 replies
@primee_lion 2025-06-18

8:03 it runs at 1.67x faster, not 66x faster

552 14 replies
@ParitoshTripathiOfficial 2025-06-20

SponsorBlock was invented for such videos

433 4 replies
@cbuchner1 2025-06-18

My understanding is that training a bitnet still requires a full precision set of weights for the gradient descent to work. So I doubt the claimed 20x energy savings during training.

293 4 replies
@Anoyzify 2025-06-22

20% of this video feels like ads

160 4 replies
@cagedgandalf3472 2025-06-19

I am more of an embedded systems/robotics type of engineer and this is amazing news. Imagine what kind of complex AI you can fit in your Arduino. I still remember when I had memory problems in Arduino as a high school student trying to store audio. Now, the same student could use AI.

93 7 replies
@bycloudAI 2025-06-18

Download Tanka today https://www.tanka.ai and enjoy 3 months of free Premium! You can also get $20 / team for each referrals

70 6 replies
@giorgos1794 2025-06-25

6 ads-plugs in a 14 minute video. Thats really great!

24
@smb1397 2026-03-26

"can you make an LLM with 1 bits"? proceeds to train it on a miniscule body of text

2

Unlock the Data Inside
Turn Videos into Knowledge

  • Get FREE 10/day: transcripts, summaries, chats
  • Chat with videos, export text & PDF
  • $1 free API credit for RAG, chatbots & research

Free forever plan • All features unlocked

App screenshot