**Exciting News in AI and Computing: NVIDIA's Blackwell Architecture Unveiled!** We're witnessing a monumental leap in AI capabilities with NVIDIA's introduction of the Blackwell architecture. This cutting-edge technology is set to revolutionize generative AI and accelerated computing. Here's what you need to know: 🚀 **Blackwell GPUs**: At the heart of the architecture are the Blackwell GPUs, each packed with **208 billion transistors** and built using a custom **TSMC 4NP process**. These GPUs feature **two reticle-limited dies** connected by a **10 TB/s chip-to-chip interconnect**, functioning as a unified GPU. 💡 **GB200 NVL72**: The flagship model, the GB200 NVL72, is a marvel of engineering. It connects **36 dual-GPU Grace Blackwell "superchips"** (totaling **72 GPUs**) and **36 Grace CPUs** in a liquid-cooled, rack-scale design. This configuration acts as a single massive GPU, boasting **30X faster real-time inference** for trillion-parameter LLMs and supporting **13.5 TB of HBM3e memory**. 🧠 **Second-Generation Transformer Engine**: The Transformer Engine is enhanced with custom Blackwell Tensor Cores, enabling **4-bit floating point (FP4) AI** and fine-grain **micro-tensor scaling**. Coupled with NVIDIA TensorRT-LLM and NeMo Framework, it accelerates both inference and training for LLMs and MoE models. 🔒 **Security**: NVIDIA Confidential Computing ensures robust hardware-based security for sensitive data and AI models. 🔗 **NVLink and NVLink Switch**: The fifth-generation NVLink interconnect can scale up to **576 GPUs**, with the NVL72 configuration featuring a **72-GPU NVLink domain**. The NVLink Switch Chip provides an astounding **130TB/s of GPU bandwidth**, supporting NVIDIA SHARP™ FP8. 💲 **Pricing**: The GB200 NVL72 cabinet, with its 72 chips, is priced at a cool **$3 million**. This reflects the unparalleled performance and advanced technology that NVIDIA brings to the table. NVIDIA's Blackwell architecture is a game-changer for the AI and computing industries, offering unprecedented performance and efficiency. It's a testament to NVIDIA's commitment to innovation and leadership in the field. #NVIDIA #BlackwellArchitecture #AI #Computing #Innovation #Technology
Anton Dubov’s Post
More Relevant Posts
-
🚀 Revolutionizing AI: THE NVIDIA's Blackwell Architecture 🚀 Named in honor of distinguished mathematician David H. Blackwell, the Blackwell GPU architecture meets modern AI workloads' soaring computational and bandwidth demands. As AI models grow exponentially in size and complexity, advanced computational capability and memory capacity become critical. The Blackwell GPU architecture addresses these needs by offering a substantial leap in performance and efficiency. #### The B200 Tensor GPU: A Technological Marvel --> Transistor Count & Manufacturing: With 208 billion transistors, the B200 Tensor GPU is built using TSMC's 4NP manufacturing process, maximizing density and efficiency. --> Performance: Delivering approximately 20 PetaFLOPS of AI performance using FP4 precision, the B200 is a powerhouse: --> 4x the training performance. --> 30x the inference performance of the previous-generation Hopper GPU. --> Achieves 25x better energy efficiency. #### Dual-Die Configuration: Unmatched Power The B200's dual-die setup merges two of the largest manufacturable dies into a single GPU unit via NVIDIA High Bandwidth Interface fabric (NV-HBI), supporting an impressive 10TB/s bandwidth between the dies. #### Memory & Bandwidth --> 192GB of HBM3e memory. --> Over 8TB/s of peak memory bandwidth. --> 1.8TB/s of NVLink bandwidth. --> Significant enhancements over the H100, providing the resources to handle large-scale AI models and complex computations more effectively. #### NVIDIA's 5th Generation NVLink: Doubling the Performance --> Supports up to 576 GPUs. --> Provides 3.6TFLOPS of in-network computing. --> Facilitates tensor reductions and combinations directly within the network fabric, optimizing computational tasks and enhancing scalability. 🚀 GB200 NVL72 Cluster: The Future of AI Infrastructure 🚀 --> NVIDIA's GB200 NVL72 cluster consolidates multiple GB200-powered systems into a single, liquid-cooled rack, connecting: --> 36 GB200 Superchips (36 Grace CPUs and 72 Blackwell GPUs). --> Includes NVIDIA BlueField-3 data processing units for enhanced cloud network acceleration and security. #### Performance Metrics : --> 30x improvement in real-time trillion parameter LLM inference over the H100. --> 25x reduction in total cost of ownership. --> Uses 25x less energy for equivalent GPU counts. ##### Why Blackwell Outshines Conventional GPUs Consider a typical AI workload over a decade: --> Standard GPU: Processes 1 trillion parameters in 5 years, consuming 1000 MWh. --> Blackwell GPU: Processes 1 trillion parameters in 6 months, consuming only 40 MWh. 📈 That’s an 800% increase in efficiency, with 2500% more performance per watt. #AI #MachineLearning #GPUs #NVIDIA #Innovation #HighPerformanceComputing #BlackwellGPU #GTC2024 #TechInnovation #FutureOfAI
To view or add a comment, sign in
-
👋 Jan now supports NVIDIA’s TensorRT-LLM in addition to llama.cpp, making Jan multi-engine and ultra-fast for users with NVIDIA GPUs. We've done a performance benchmark of TensorRT-LLM on consumer-grade GPUs, which shows pretty incredible speed ups (30-70%) on the same hardware. First off, what is TensorRT-LLM? Running AI models like Llama3 and Mistral requires you to compile the models to "hardware language". This job is done by Inference Engines, more commonly referred to as "backends". Note: "Inference" is a fancy way of saying "we get your LLM to generate a reply" Popular inference engines include: - llama.cpp (most popular, dominates desktop AI) - MLX (from Apple) - TensorRT (from NVIDIA) TensorRT-LLM is NVIDIA’s relatively new and (somewhat) open source Inference Engine, which uses NVIDIA’s proprietary optimizations beyond the open source cuBLAS library. It works by optimizing and compiling the model specifically for your GPU, and highly optimizing things at the CUDA level to fully take advantage of every bit of hardware: - CUDA cores - Tensor cores - VRAM - Memory Bandwidth https://lnkd.in/gQ9e-QwX TensorRT-LLM takes a different approach from Llama.cpp, which dominates desktop inference with a “compile-once, run anywhere” approach. A good analogy is C++ vs. Java: - C++ or TensorRT-LLM Blazing fast, but runs only on machine for which it was compiled -Java or Llama.cpp Single file that can run cross-platform Both approaches are needed for open source AI to flourish. So 👋 Jan supports both! We benchmarked TensorRT-LLM on consumer-grade devices, and managed to get Mistral 7b up to: 170 tokens/s on Desktop GPUs (e.g. 4090, 3090s) 51 tokens/s on Laptop GPUs (e.g. 4070) TensorRT-LLM was 30-70% faster than llama.cpp on the same hardware, …and at least 500% faster than just using the CPU 😂 Interestingly, we found that TensorRT-LLM didn’t use much resources, completely opposite to its reputation as needing beefy hardware to run: - Used 10% more VRAM (marginal) - Used… less RAM??? Note: our RAM measurements were highly iffy, and we’d love if anyone had better ideas on how to measure it. https://lnkd.in/gpUk3pht Jan still ships with our much beloved llama.cpp as our default inference engine (shout out to Georgi Gerganov and the ggml team). TensorRT-LLM is available as an extension, which will download additional dependencies. We've also compiled a few models, and will make more available soon. Read the full benchmark: https://lnkd.in/gzd-fHT8 Special thanks to Aslı Sabancı Demiröz, Annamalai Chockalingam, Jordan Dodge from NVIDIA, and Georgi Gerganov from llama.cpp for feedback, review and suggestions.
To view or add a comment, sign in
-
It's always the use case, either because you find it, or because someone else does, and you were there to take advantage of it. NVIDIA, from general purpose GPU to AI GPU. A startup, a researcher and anyone who wants to turn an idea, technology or knowledge into a product that solves a real problem in the market, has to find the use case. What is it for?, what problem does it solve? GPUs have been in the market for a long time and they used to fill a niche where their special design made them solve a very specific type of problem. A GPU differs from a CPU, in essence, in that it is a vector processor that allows the same operation to be applied at the same time to a set of values, to a vector, rather than to a single value. CPUs are general purpose processors and can be used to solve any problem. How efficiently they can do so is another matter. GPUs are not naturally general purpose, since they need problems where their vector architecture makes sense. Well, an image on a screen is a set of rows of values indicating the color of each pixel. That's why the first major successful use case for NVDIA was the huge growth of the video game market. A video game has to be constantly rendering the screen to display the game graphics. The GPU handles that much faster than a CPU. NVIDIA has always tried to be seen as a possibility to solve many more problems, and they have always communicated this, creating tools to be applied to different use cases. NVIDIA did not know this, nor did they create the use case, but being positioned in the right place, the mega killer use case appeared: Generative AI with Large Language Models (LLM) used by ChatGPT and others. An LLM is a set of rows with probabilities, in essence, i.e. a vector. And who was there to solve vector computing better than anyone else? That's right, NVIDIA. From niche use cases and non-negligible success in video games to being today the third largest technology company in the world. NVDIA is, in fact, the company capturing the most value in the world with AI by far. This has led to the reasonable and expected change: NVIDIA is no longer so interested in being perceived as a solution applicable to various problems, but now the “general purpose GPU and you can do many things” has become in their own conferences “GPU for AI, faster, better”. Why? Because they have finally found the indisputable use case. That's how important the use case is. I hope it helps you to focus your value proposition, startup or transfer. Find below one of the slides I use to explain reasearchers the cycle from science to market, with the use case as the first step. #entrepreneurship #innovation #AI
To view or add a comment, sign in
-
https://lnkd.in/eXR_F_zh GTC—NVIDIA today announced its next-generation AI supercomputer — the NVIDIA DGX SuperPOD™ powered by NVIDIA GB200 Grace Blackwell Superchips — for processing trillion-parameter models with constant uptime for superscale generative AI training and inference workloads. Featuring a new, highly efficient, liquid-cooled rack-scale architecture, the new DGX SuperPOD is built with NVIDIA DGX™ GB200 systems and provides 11.5 exaflops of AI supercomputing at FP4 precision and 240 terabytes of fast memory — scaling to more with additional racks. Each DGX GB200 system features 36 NVIDIA GB200 Superchips — which include 36 NVIDIA Grace CPUs and 72 NVIDIA Blackwell GPUs — connected as one supercomputer via fifth-generation NVIDIA NVLink®. GB200 Superchips deliver up to a 30x performance increase compared to the NVIDIA H100 Tensor Core GPU for large language model inference workloads. “NVIDIA DGX AI supercomputers are the factories of the AI industrial revolution,” said Jensen Huang, founder and CEO of NVIDIA. “The new DGX SuperPOD combines the latest advancements in NVIDIA accelerated computing, networking and software to enable every company, industry and country to refine and generate their own AI.” The Grace Blackwell-powered DGX SuperPOD features eight or more DGX GB200 systems and can scale to tens of thousands of GB200 Superchips connected via NVIDIA Quantum InfiniBand. For a massive shared memory space to power next-generation AI models, customers can deploy a configuration that connects the 576 Blackwell GPUs in eight DGX GB200 systems connected via NVLink.
To view or add a comment, sign in
-
Highlighting: "A single 8xSohu server is said to equal the performance of 160 H100 GPUs, meaning data processing centers can save both on initial and operational costs if the Sohu meets expectations." ----- Etched comes at NVidia creatively by focusing on transformer models. Could the Sohu chip reduce need for Nvidia A100 and H100 chips? ----- TomsHardware: "Sohu AI chip claimed to run models 20x faster and cheaper than Nvidia H100 GPUs. Startup Etched has created this LLM-tuned transformer ASIC." (Jowi Morales) (June 26, 2024) "Etched, a startup that builds transformer-focused chips, just announced Sohu, an application-specific integrated circuit (ASIC) that claims to beat Nvidia’s H100 in terms of AI LLM inference. A single 8xSohu server is said to equal the performance of 160 H100 GPUs, meaning data processing centers can save both on initial and operational costs if the Sohu meets expectations. According to the company, current AI accelerators, whether CPUs or GPUs, are designed to work with different AI architectures. These differing frameworks and designs mean hardware must be able to support various models, like convolution neural networks, long short-term memory networks, state space models, and so on. Because these models are tuned to different architectures, most current AI chips allocate a large portion of their computing power to programmability. Most large language models (LLMs) use matrix multiplication for the majority of their compute tasks and Etched estimated that Nvidia’s H100 GPUs only use 3.3% percent of their transistors for this key task. This means that the remaining 96.7% silicon is used for other tasks, which are still essential for general-purpose AI chips. Etched made a huge bet on transformers a couple of years ago when it started the Sohu project. This chip bakes in the transformer architecture into the hardware, thus allowing it to allocate more transistors to AI compute. We can liken this with processors and graphics cards let’s say current AI chips are CPUs, which can do many different things, and then the transformer model is like the graphics demands of a game title. Sure, the CPU can still process these graphics demands, but it won’t do it as fast or as efficiently as a GPU. A GPU that’s specialized in processing visuals will make graphics rendering faster and more efficient. This is what Etched did with Sohu. Instead of making a chip that can accommodate every single AI architecture, it built one that only works with transformer models. The company’s gamble now looks like it is about to pay off, big time. Sohu’s launch could threaten Nvidia’s leadership in the AI space, especially if companies that exclusively use transformer models move to Sohu. After all, efficiency is the key to winning the AI race, and anyone who can run these models on the fastest, most affordable hardware will take the lead." TomsHardware: https://lnkd.in/g2ZGiU-z #ai #cloud #aicloud #cloudai #cloudgpu #genai #transformermodel
Sohu AI chip claimed to run models 20x faster and cheaper than Nvidia H100 GPUs
tomshardware.com
To view or add a comment, sign in
-
I'm an enthusiastic fan of NVIDIA. NVIDIA unveils Blackwell architecture to power next GenAI wave By Ryan Daws - AI News NVIDIA has announced its next-generation Blackwell GPU architecture, designed to usher in a new era of accelerated computing and enable organisations to build and run real-time generative AI on trillion-parameter large language models. The Blackwell platform promises up to 25 times lower cost and energy consumption compared to its predecessor: the Hopper architecture. Named after pioneering mathematician and statistician David Harold Blackwell, the new GPU architecture introduces six transformative technologies. “Generative AI is the defining technology of our time. Blackwell is the engine to power this new industrial revolution,” said Jensen Huang, Founder and CEO of NVIDIA. “Working with the most dynamic companies in the world, we will realise the promise of AI for every industry.” The key innovations in Blackwell include the world’s most powerful chip with 208 billion transistors, a second-generation Transformer Engine to support double the compute and model sizes, fifth-generation NVLink interconnect for high-speed multi-GPU communication, and advanced engines for reliability, security, and data decompression. Central to Blackwell is the NVIDIA GB200 Grace Blackwell Superchip, which combines two B200 Tensor Core GPUs with a Grace CPU over an ultra-fast 900GB/s NVLink interconnect. Multiple GB200 Superchips can be combined into systems like the liquid-cooled GB200 NVL72 platform with up to 72 Blackwell GPUs and 36 Grace CPUs, offering 1.4 exaflops of AI performance.
NVIDIA unveils Blackwell architecture to power next GenAI wave
https://www.artificialintelligence-news.com
To view or add a comment, sign in
-
Chinese GPU maker Moore Threads' MTLink fabric tech challenges Nvidia's NVLink, can now scale to 10,000 GPUs for AI clusters and 1280000 Tensor cores. More GPUs, more performance. Moore Threads has upgraded its KUAE data center server for AI, enabling connecting up to 10,000 GPUs in a single cluster. The KUAE data center servers integrate eight MTT S4000 GPUs interconnected using the proprietary MTLink technology designed specifically for training and running large language models (LLMs). These GPUs are based on the MUSA architecture and feature 128 tensor cores and 48 GB GDDR6 memory with 768 GB/s of bandwidth. A 10,000-GPU cluster wields 1,280,000 tensor cores, but the actual performance is unknown as performance scaling depends on numerous factors. Moore Threads recently completed a financing round, raising up to 2.5 billion yuan (approximately US$343.7 million). This influx of funds is expected to support its ambitious expansion plans and technology advancements. However, without access to advanced process technologies offered by TSMC, Intel Foundry, and Samsung Foundry, the firm faces numerous challenges on the path to developing next-gen GPUs.
Chinese GPU maker Moore Threads' MTLink fabric tech challenges Nvidia's NVLink, can now scale to 10,000 GPUs for AI clusters
tomshardware.com
To view or add a comment, sign in
-
An excellent explainer by my CSIRO colleague Conrad Sanderson which demystifies the world of GPUs and AI accelerator chips. If you're serious about AI strategy this is an important part of the puzzle. To make serious AI you need the hardware - and that means GPUs. So what's a GPU and why do they matter for your organisation's AI journey. Read this to find out.
What is a GPU? An expert explains the chips powering the AI boom, and why they’re worth trillions
theconversation.com
To view or add a comment, sign in
-
Nvidia’s latest chip promises to boost AI’s speed and energy efficiency. What’s new: The market leader in AI chips announced the B100 and B200 graphics processing units (GPUs) designed to eclipse its in-demand H100 and H200 chips. The company will also offer systems that integrate two, eight, and 72 chips. How it works: The new chips are based on Blackwell, an updated chip architecture specialized for training and inferencing transformer models. Compared to Nvidia’s earlier Hopper architecture, used by H-series chips, Blackwell features hardware and firmware upgrades intended to cut the energy required for model training and inference. Training a 1.8-trillion-parameter model (the estimated size of OpenAI’s GPT-4 and Beijing Academy of Artificial Intelligence’s WuDao) would require 2,000 Blackwell GPUs using 4 megawatts of electricity, compared to 8,000 Hopper GPUs using 15 megawatts, the company said. Blackwell includes a second-generation Transformer Engine. While the first generation used 8 bits to process each neuron in a neural network, the new version can use as few as 4 bits, potentially doubling compute bandwidth. A dedicated engine devoted to reliability, availability, and serviceability monitors the chip to identify potential faults. Nvidia hopes the engine can reduce compute times by minimizing chip downtime. Nvidia doesn’t make it easy to compare the B200 with rival AMD’s top offering, the MI300X. Price and availability: The B200 will cost between $30,000 and $40,000, similar to the going rate for H100s today, Nvidia CEO Jensen Huang told CNBC. Nvidia did not specify when the chip would be available. Google, Amazon, and Microsoft stated intentions to offer Blackwell GPUs to their cloud customers. Behind the news: Demand for the H100 chip has been so intense that the chip has been difficult to find, driving some users to adopt alternatives such as AMD’s MI300X. Moreover, in 2022, the U.S. restricted the export of H100s and other advanced chips to China. The B200 also falls under the ban. Why it matters: Nvidia holds about 80 percent of the market for specialized AI chips. The new chips are primed to enable developers to continue pushing AI’s boundaries, training multi-trillion-parameter models and running more instances at once. We’re thinking: Cathie Wood, author of ARK Invest’s “Big Ideas 2024” report, estimated that training costs are falling at a very rapid 75 percent annually, around half due to algorithmic improvements and half due to compute hardware improvements. Nvidia’s progress paints an optimistic picture of further gains. It also signals the difficulty of trying to use model training to build a moat around a business. It’s not easy to maintain a lead if you spend $100 million on training and next year a competitor can replicate the effort for $25 million. Andrew Ng, [email protected]
To view or add a comment, sign in
-
🚨 BREAKING: NVIDIA - the global leader in AI computing - has just become the world's MOST VALUABLE company and its explosive growth is directly related to recent AI advancements. Here's what you need to know: 👾 NVIDIA was founded on April 5, 1993, by Jensen Huang, Chris Malachowsky, and Curtis Priem. Their goal was to bring 3D graphics to the gaming & multimedia markets. 👾 The name NVIDIA is a combination of two terms: "invidia," the Latin word for envy, and the acronym NV (short for “next vision"), which was how the co-founders used to name their documents). 👾 In 1999, it invented the GPU, the graphics processing unit, which reshaped the computing industry. For those not familiar, a GPU is: "an electronic circuit that can perform mathematical calculations at high speed. Computing tasks like graphics rendering, machine learning (ML), and video editing require the application of similar mathematical operations on a large dataset. A GPU’s design allows it to perform the same operation on multiple data values in parallel. This increases its processing efficiency for many compute-intensive tasks." Amazon Web Services (AWS) 👾 On the relationship between GPUs and AI development: "GPUs are the dominant computing platform for accelerating machine learning workloads, and most (if not all) of the biggest models over the last five years have been trained on GPUs … [they have] thereby centrally contributed to the recent progress in AI" Epoch AI 👾 On NVIDIA's recent AI advancements, according to Stanford University's AI Index Report, in 2023, "a team at Nvidia discovered a novel approach to improving the chips that power AI systems: use AI systems to design better chips. They were able to train a reinforcement learning agent to design chip circuits that are smaller, faster, and more efficient than the circuits designed by electronic design automation tools (EDAs). One of Nvidia’s latest categories of chips, the Hopper GPU architecture, has over 13,000 instances of AI-designed circuits." Stanford Institute for Human-Centered Artificial Intelligence (HAI) Index report 2023 👾 According to some finance experts, this is just the BEGINNING of NIVIA's growth, as the pace of AI development is accelerating: "This quickening pace of innovation implies that rivals probably won't have time to challenge Nvidia's dominant position in the AI-capable graphics processing unit (GPU) space. While competitors like Advanced Micro Devices and Intel are aiming to cut into Nvidia's dominant market share, the window of opportunity is closing." George Budwell, Yahoo Finance 2024) 👾 Never miss my updates: subscribe to my newsletter #AI #NVIDIA #AIcomputing
To view or add a comment, sign in
Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer
7moThe unveiling of NVIDIA's Blackwell architecture marks a significant milestone in AI and computing, showcasing the remarkable advancements in GPU technology. With its revolutionary design featuring reticle-limited dies and a high-bandwidth chip-to-chip interconnect, Blackwell sets a new standard for performance and scalability in the industry. However, amidst this rapid evolution, how do you foresee Blackwell shaping the future landscape of AI applications, particularly in domains like natural language processing and computer vision where massive-scale models are increasingly prevalent?