Matt Kappes’ Post

View profile for Matt Kappes, graphic

Theoretical Biologist 〡 Scientific Advisor

The Era of 1-bit LLMs “Every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption.

2402.17764

2402.17764

arxiv.org

To view or add a comment, sign in

Explore topics