100B Models on Your CPU

Veröffentlicht am April 23, 2025 von Alexander Heintz

Lior⚡ (@LiorOnAI) announced on April 19, 2025, that Microsoft has open-sourced bitnet.cpp, a 1-bit LLM inference framework. It enables running 100B parameter models on local CPUs without GPUs, offering 6.17x faster inference and 82.2% less energy consumption. The framework supports models like Llama3, Falcon3, and BitNet. A screenshot shows a command to run a 100B parameter model on an Apple system with 12 nodes, achieving a speed of 0.8 tokens/sec.

You can now run 100B parameter models on your local CPU without GPUs.

Microsoft finally open-sourced their 1-bit LLM inference framework called bitnet.cpp:

> 6.17x faster inference
> 82.2% less energy on CPUs
> Supports Llama3, Falcon3, and BitNet models pic.twitter.com/AGPOsUjlyB
— Lior⚡ (@LiorOnAI) April 19, 2025

https://github.com/microsoft/BitNet

100B Models on Your CPU

Ähnliche Beiträge

Neueste Beiträge

Archive

Meta