100B Models on Your CPU

Lior⚡ (@LiorOnAI) announced on April 19, 2025, that Microsoft has open-sourced bitnet.cpp, a 1-bit LLM inference framework. It enables running 100B parameter models on local CPUs without GPUs, offering 6.17x faster inference and 82.2% less energy consumption. The framework supports models like Llama3, Falcon3, and BitNet. A screenshot shows a command to run a 100B parameter model on an Apple system with 12 nodes, achieving a speed of 0.8 tokens/sec.

https://github.com/microsoft/BitNet