xynova’s Post

View organization page for xynova, graphic

5 followers

👀 Cerebras running Big Models at Instant Speeds is something to watch out for. With this company just having filed for an IPO, and inference-time compute being a proven path to push AI models beyond parameter size and data availability, their situation is intriguing. Their ahead-of-its-time "bigger is better" approach to hardware design could play out very interestingly in the coming couple of years (AI Inference Chip Market Size And Forecast: https://lnkd.in/gC6gh9Sc) Youtube: WSE architecture: https://lnkd.in/gaqxht5j A few highlights about their numbers: 🧮 Load massive models up to 24 trillion parameters on one chip 💻 900,000 cores for distributed model weight processing 💾 2.4 Petabytes of high-performance memory (MemoryX) ⚡ 7,000x more memory bandwidth than leading GPUs 🚀 Higer speed performance during both training and inference time 🌱 Lower power consumption and reduced infrastructure complexity 💰 Competitive pricing: Cerebras Inference at 10¢ per million tokens - https://lnkd.in/gYpEBvUt World Record Meta's Llama 3.1-405B Model clocking 969 Tokens per Second with 240ms latency - https://lnkd.in/e4WFqh49 This new world record means that scientists can now complete two years’ worth of GPU-based simulation work every single day

Cerebras Co-Founder Deconstructs Blackwell GPU Delay

https://www.youtube.com/

To view or add a comment, sign in

Explore topics