👀 Cerebras running Big Models at Instant Speeds is something to watch out for. With this company just having filed for an IPO, and inference-time compute being a proven path to push AI models beyond parameter size and data availability, their situation is intriguing. Their ahead-of-its-time "bigger is better" approach to hardware design could play out very interestingly in the coming couple of years (AI Inference Chip Market Size And Forecast: https://lnkd.in/gC6gh9Sc) Youtube: WSE architecture: https://lnkd.in/gaqxht5j A few highlights about their numbers: 🧮 Load massive models up to 24 trillion parameters on one chip 💻 900,000 cores for distributed model weight processing 💾 2.4 Petabytes of high-performance memory (MemoryX) ⚡ 7,000x more memory bandwidth than leading GPUs 🚀 Higer speed performance during both training and inference time 🌱 Lower power consumption and reduced infrastructure complexity 💰 Competitive pricing: Cerebras Inference at 10¢ per million tokens - https://lnkd.in/gYpEBvUt World Record Meta's Llama 3.1-405B Model clocking 969 Tokens per Second with 240ms latency - https://lnkd.in/e4WFqh49 This new world record means that scientists can now complete two years’ worth of GPU-based simulation work every single day
xynova’s Post
More Relevant Posts
-
GPU-as-a-Service Blog Series: An Introduction 🗓️ Jul 9, 2024 The Gen AI industry's rapid growth has propelled the GPU market, with NVIDIA at the forefront. Enterprises and startups now prefer renting GPUs over buying dedicated hardware, creating a booming "GPU-as-a-Service" (GPUaaS) market. This industry is expected to grow 16x to $80B over the next decade, driven by the need for flexible and dynamic GPU consumption models. Sequoia Capital recently likened the GPU Capex buildout to the historic railroad industry—“build the railroad and hope they will come.” At Aarna Networks, we recognize the complexities and opportunities in this emerging field. Over the next few weeks, we’ll share insights on how AI Cloud providers can navigate the evolving GPUaaS landscape. Our series of blog aims to help enterprises, startups, data center companies, and AI Cloud providers make informed decisions. Stay tuned for our expert perspectives and practical advice! 🌐 #GPUaaS #CloudInfrastructure #AI #TechInnovation #NVIDIA #AarnaNetworks
To view or add a comment, sign in
-
💥 One week out from #nvidiagtc ! I've rarely seen so many old friends and colleagues from so many different industries attending a single conference. I'll be there all four days of the conference so I decided to check out some of the talks ahead of time to find my top 5. Featuring speakers from Berkeley Lab, Tomorrow.io, U.S. Department of State, Emerson Collective, and of course NVIDIA this is what I came up with: Huge Ensembles of Weather Extremes using NVIDIA's Fourier Forecasting Neural Network (FourCastNet) https://lnkd.in/gaKPij7q Large-Scale Graph GNN Training Accelerated With cuGraph https://lnkd.in/gS6j3AVQ Bridging the Compute Divide to Mitigate Climate Risk https://lnkd.in/g6KZEXpM Heat Transfer Modeling for EV Batteries https://lnkd.in/gxRFfT8Q Global Strategies: Startups, Venture Capital, and Climate Change Solutions https://lnkd.in/geQdDAiS
NVIDIA #GTC2024 Conference Session Catalog
nvidia.com
To view or add a comment, sign in
-
Super 8 is so proud to be mentioned in the keynote conclusion by NVIDIA CEO Jensen Huang. Thanks,NVIDIA, and thank you, Jensen, for continuously highlighting why Taiwan is so pivotal to the AI industry. Next, we delve into the realm of AI applications and software, where innovation meets practicality. Super 8 is destined to be one of the future pioneers. Together, we are not just shaping the future; we are creating it. Check out the video highlight in here.
✂️ Nvidia Computex 2024 Keynote Hightlight
youtube.com
To view or add a comment, sign in
-
Etched has gone and build their own 'Transformer' (the T from GPT) baked ASIC which outperform Nvidia's H100 and B100 by 20X. They have raised $120mils so far. Source: etched.com It is a genius move to build a chip around the architecture we use to train LLMs. From 30-40% FLOPS to 90% FLOPS utilization is just amazing to see, think of the resources like time, energy, heat, cooling being saved.
To view or add a comment, sign in
-
Ready for the next step? Check out part 2 of our series "Unlocking AI & ML Metal Performance with QBO Kubernetes Engine (QKE)" to learn how to install Kubeflow with Nvidia GPU Operator support 💡 https://lnkd.in/gAGyTuW5 📝 If you missed part one from last week, make sure to catch up before diving into this session. #QBO #Kubernetes #Nvidia #CloudComputing #QKE #AI #ML
QKE + Kubeflow + NVIDIA GPU Operator + Kubernetes-in-Docker + Cgroups v2 - Part 2
https://www.youtube.com/
To view or add a comment, sign in
-
At Voice of Reason AI, where we focus on interfaceless personal aids for individuals with memory and brain disorders, inference speed is not just a luxury—it’s a necessity. Our AI may remain silent most of the time, yet it must continuously listen, understand, and analyze context so it can respond instantly when help is needed. Think of it like the cymbal player in an orchestra, silent for long stretches but ready to deliver a perfectly timed crash. For us, it's essential that our AI “cymbal” sounds precisely when a user needs guidance. This makes Cerebras' advancements incredibly exciting, as they could enable us to create real-time, ambient aids that seamlessly enhance daily life without unnecessary delays. Such speed is crucial for AI that doesn’t merely react but anticipates needs in real-time, helping individuals maintain independence and dignity.
Cerebras Inference runs Llama 3.1-70B at an astounding 2,100 tokens per second. It’s 16x faster than the fastest GPU solution. That’s 3x faster since our launch just 2 months ago. We can’t wait to help our partners push the boundaries of what’s next. Try it today: https://chat.cerebras.ai/
To view or add a comment, sign in
-
🚀 𝗥𝗲𝗮𝗹-𝗧𝗶𝗺𝗲 𝗘𝗻𝗱-𝘁𝗼-𝗘𝗻𝗱 𝗢𝗯𝗷𝗲𝗰𝘁 𝗗𝗲𝘁𝗲𝗰𝘁𝗶𝗼𝗻: 𝗬𝗢𝗟𝗢𝘃𝟭𝟬 𝘃𝘀 𝗬𝗢𝗟𝗢𝘃𝟴 🚀 YOLOvX team have compared YOLOv10 and YOLOv8 in terms of speed. Two of the popular object detection models are YOLOv10, the latest version of the YOLO architecture, and YOLOv8, the previous version. Speed Performance Comparison (NVIDIA RTX 3060): 𝐘𝐎𝐋𝐎𝐯𝟏𝟎 Speed: 2.0ms preprocess, 13.4ms inference, 1.3ms postprocess per image at shape (1, 3, 384, 640) 𝐘𝐎𝐋𝐎𝐯𝟖 Speed: 2.0ms preprocess, 8.9ms inference, 1.7ms postprocess per image at shape (1, 3, 384, 640) 𝗞𝗲𝘆 𝗵𝗶𝗴𝗵𝗹𝗶𝗴𝗵𝘁𝘀 𝗼𝗳 𝗬𝗢𝗟𝗢𝘃𝟭𝟬 😍 ✅ NMS-free training: Improved performance and reduced latency. ✅ Uses spatial-channel decoupled downsampling for ops efficiency ✅ Adds new compact inverted block (CIB) ✅ Holistic design: Optimized components for efficiency and capability. 🚀 Awesome work by: Tsinghua University team 👏 While these early experiments suggest YOLOv8n has a speed advantage over YOLOv10n, more rigorous benchmarking is needed to validate if this holds across various conditions. We are looking forward to doing additional research to better understand the tradeoffs between these models for real-time applications. Stay tuned for more exciting developments and breakthroughs on the horizon!✨ ---- Like this post? Follow Muhammad Ehsan, press “like,” and hit the 🔔 on my profile and/or share with your network. #computervision #ai #objectdetection #yolov8 #yolov10 #innovation #technology #machinelearning #deeplearning #realtime #datascience #research #development #benchmarking #futuretech
To view or add a comment, sign in
-
𝐍𝐯𝐢𝐝𝐢𝐚 𝐥𝐚𝐮𝐧𝐜𝐡𝐞𝐬 𝐍𝐞𝐦𝐨𝐭𝐫𝐨𝐧, 𝐚 𝟕𝟎𝐁 𝐦𝐨𝐝𝐞𝐥 𝐭𝐡𝐚𝐭 𝐨𝐮𝐭𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐬 𝐆𝐏𝐓-𝟒𝐨 𝐚𝐧𝐝 𝐂𝐥𝐚𝐮𝐝𝐞 𝟑.𝟓 𝐒𝐨𝐧𝐧𝐞𝐭 Its advanced architecture and training methodologies have made it lightweight when compared to GPT-4o mini and Meta’s Llama models. The 𝐋𝐥𝐚𝐦𝐚 𝟑.𝟏 𝐍𝐞𝐦𝐨𝐭𝐫𝐨𝐧-𝟕𝟎𝐁 model builds on the 𝐋𝐥𝐚𝐦𝐚 𝟑.𝟏 𝐚𝐫𝐜𝐡𝐢𝐭𝐞𝐜𝐭𝐮𝐫𝐞 which is based on a transformer technology using RLHF (specifically reinforce) and ready for commercial use. It offers 70 billion parameters which allows it to process and generate human-like responses that are coherent and fluent. When it comes to performance, the model has achieved top scores on alignment benchmarks like Arena Hard (85.0), AlpacaEval 2 LC (57.6), and GPT-4-Turbo MT-Bench (8.98). The new Nemotron model is a testament to the fact that smaller and more efficient models can compete or even outshine some of the industry leaders. This model was trained on Llama-3.1-70B-Instruct 𝐇𝐞𝐫𝐞'𝐬 𝐰𝐡𝐞𝐫𝐞 𝐲𝐨𝐮 𝐜𝐚𝐧 𝐭𝐫𝐲 𝐢𝐭 𝐟𝐨𝐫 𝐟𝐫𝐞𝐞: 1) https://lnkd.in/d9EWRDMY 2) https://lnkd.in/dvAMSgw6 On Reddit, 𝐮𝐬𝐞𝐫 /𝐮/𝐭𝐡𝐢𝐜𝐤_𝐥𝐥𝐚𝐤𝐞𝟔𝟗𝟗𝟎 wrote a great comment on this release: "NVIDIA is in such an envious position. Make the open-source models so good that all the for-profits have to order more chips to train increasingly complex models to distinguish their models to justify charging for access, and even if they don't, people still need to buy hardware to run your free model. As long as they stay on top of the custom chips for model performance and invest enough into the neuromorphic chip future, they really can't lose.“ #NvidiaNemotron #GenerativeAI #LLMs #Nemotron70B #AIModels #MachineLearning #Transformers #RLHF #NeuralNetworks #AIResearch #NaturalLanguageProcessing #OpenSourceAI #AIForBusiness #ArtificialIntelligence #NeuralComputing #TechNews #AIInnovation #DeepLearning #G42
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF - HuggingChat
huggingface.co
To view or add a comment, sign in
-
The 'problem' with having a GPU-native CFD solver is there aren't a lot of benchmarks because we're right on the cutting edge of technology. Everything published is AI-oriented, which is fine, but we don't use tensor cores at all. I've worked (well, tried to work) with several other HPC server vendors to benchmark Dotmatics M-Star Simulations and one of them literally told me they "didn't need the business right now". What! Maybe when the hype of wizard cat pictures fades away.... Jason Chen at Exxact Corporation did the exact opposite -- "What tests do you want to see before you buy? Tell me more about your use cases." He's going to throw 10x L40S into my new server to make sure everything works great before it's at my shop. That's customer service! And these guys offer the liquid-cooled 1U Grace Hopper Superchip by NVIDIA. Check them out: https://lnkd.in/eEAWYg4X
To view or add a comment, sign in
-
🚀 20 Petaflops 🚀 208 billion transistors 🚀 1 trillion parameter models can be trained 🚀 based on a 4nm architecture all in one GPU? #nvidia CEO Jensen Huang presented yesterday at the GTC the new GPU architecture Blackwell. It comes with a new data type FP4 and is inferencing and will allow the fastest computing of smaller packages data and deliver the results back much faster. Crazy performance, compared to my master thesis, were i worked with a super cluster computer with 32CPU`s (but to be fair it was in 2006 😊). The future is now, it will exponential increase the power and possibilities of simulations. Be curious! #stammtischtalk #ai
To view or add a comment, sign in
5 followers