Hey everyone, I stumbled upon this super cool paper recently: 'The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits'. It's a game-changer! Picture this: we switch up the LLM weights from float64 to ternary [-1, 0, 1]. Boom! Instant improvement in power, latency, and memory usage. It's like upgrading to turbo mode for your AI! So, we go ahead and convert those weights to ternary (Pareto improvement!), by quantization function. That basically means we're simplifying things by only having three options instead of a gazillion. then when we calculate the output by: Sum( weights * input + bias) We're just adding stuff up without all the fancy multiplication. It's become an addition process only. Who knew you could get the same result with less hardware? check the result below paper link:https://lnkd.in/eu6jpzuY #LLM
Art!
--
9mo💪🏻💪🏻🎉