NVIDIA just blew everyone away with their latest chip: the GH200 Superchip. 🔥 This is the hardware that’s making it happen. The chip that just smashed it in the MLPerf Inference v4.1 benchmarks. For the tech nerds like me, here’s why it matters: The GH200 isn’t your average chip. It combines the Grace CPU and Hopper GPU, with a 900GB/s NVLink-C2C interconnect, which means no more bottlenecks between CPU and GPU. This architecture means more performance, less latency, and a ton more power to tackle the biggest challenges in AI. It outperformed even the best two-socket CPU-only setups by up to 22 times in critical AI benchmarks. This thing is real-time ready, with less than 5% performance drop in live environments—while CPU-only systems were dropping up to 55%! That’s a big win for anyone looking to deploy production-grade AI solutions. 🏆 But it doesn’t stop there. In simple terms, the GH200 NVL2 is like taking two incredibly powerful superchips and linking them together, doubling their strength. You’re looking at 8 petaflops of AI performance, which basically means they can do a mind-boggling number of calculations really fast. It’s built for next-gen AI workloads like large language models (LLMs) and high-performance computing (HPC). Companies like HPE and Oracle Cloud Infrastructure are already onboard, integrating this tech into their server designs. This is the real deal when it comes to scaling AI. If your business is leaning into AI, this kind of performance leap is going to be a massive enabler. If not—why not?
Brett StClair’s Post
More Relevant Posts
-
https://lnkd.in/e7QXuT_M Everyone get ready for the computing power fight! Just a handful of weeks ago, Nvidia was the apex of the computer power hierarchy. Tencent just upgraded its computer power to not only rival, but topple Nvidia, by reducing costs, offering additional services, and getting around the US export ban. Tencent wants to muscle in on the restricted access to advanced processors that Nvidia has coveted, due to stringent US export controls. How do they cut around Nvidia and cut costs? "By upgrading its network, Tencent has not only boosted the communication process but also managed to cut costs, a win-win for the firm." LLMs - large language models - are fast becoming a service offering for the industry, and Tencent is capturing its share of the market. "Tencent is actively promoting its in-house developed LLMs for enterprise applications, and it additionally offers services that assist other companies in developing their own AI models." What about the US export ban? Companies are bypassing the infrastructure and moving into the offering services that are not subject to the ban. "Tencent recently made its Hunyuan LLM lite version free and reduced prices for standard versions....Tencent has significantly upgraded its HPC network, known as Xingmai, enhancing its AI capabilities by up to 60% in network communications and 20% in LLM training." The war of computer power has brought speed to Tencent, but also has reduced costs, allowed the company to offer new services such as free versions of the Hunyuan LLM lite, as well as reduced prices for standard versions, and provided clever ways to get around the US export ban.
To view or add a comment, sign in
-
As someone deeply involved in the world of data infrastructure and AI, we recently unveiled our vision for the future of intelligent data infrastructure, and the response has been overwhelming. The conversations I've had with customers and analysts have highlighted the eagerness to witness the next wave of innovation in this space. While it's important to look ahead and anticipate what's next, it's equally crucial to appreciate the remarkable progress we've made and how rapidly our products are evolving to support AI initiatives. One significant example of our accelerating innovation is the benchmarking results we achieved with NVIDIA Magnum IO GPUDirect Storage (GDS). GDS plays a crucial role in AI workloads by enabling GPUs to directly interact with storage, bypassing the CPU. Last year, we shared the GDS metrics for our previous generation AFF, and now, in 2024, I'm delighted to announce that we have achieved a performance boost of over 2X with our new A90 and the latest NetApp ONTAP software! So, why is this important for enterprise customers? Well, imagine the possibilities when it comes to leveraging NVIDIA GPUs for your AI workloads. The impressive data transfer rate of 351 GiB/s for a 4-system A90 cluster opens up new horizons for AI applications that demand high-performance computing. From training deep learning models to running complex simulations, these benchmark results demonstrate the immense potential and scalability of our solutions. NetApp's commitment to simplicity and scalability is evident in our offerings. With enhancements to our NFS solutions and advancements in frontend and backend networking, we ensure that you can achieve optimal performance without the need for specialized training or over-provisioning. Our comprehensive ONTAP data management OS empowers you to seamlessly operate on-premises and in the cloud, eliminating data silos and enabling a truly unified AI infrastructure. The implications of these benchmarking results go beyond just numbers on a chart. They signify a significant validation of our current AI capabilities, offering a glimpse of what's to come. As we continue to push the boundaries of innovation, it's the real-world outcomes that our customers achieve that truly matter. We are grateful for the trust and partnership of our customers, and we can't wait to embark on the next phase of this incredible journey together! #NetApp #IntelligentDataInfrastructure #AI
Simplifying AI performance for enterprise AI | NetApp Blog
netapp.com
To view or add a comment, sign in
-
🚀 NVIDIA Unveils Blackwell Chip at GTC 2024 During the GTC presentation, Jensen Huang from NVIDIA has introduced the Blackwell chip at GTC 2024, marking a significant advancement in AI computing. Compared to its predecessor, the H100 chip from 2022, the Blackwell chip offers a remarkable 20 petaflops in AI performance, providing AI companies with powerful processing capabilities for complex projects and advancing artificial intelligence. NVIDIA introduced an entire server called the GB200 NVLink 2, combining 72 Blackwell GPUs and other Nvidia parts designed to train AI models. Amazon, Google, Microsoft, and Oracle will sell access to the GB200 through cloud services. The GB200 pairs two B200 Blackwell GPUs with one Arm-based Grace CPU. NVIDIA said Amazon Web Services (AWS) would build a server cluster with 20,000 GB200 chips. 🤯 NVIDIA said that the system can deploy a 27-trillion-parameter model. That’s much larger than even the biggest models, such as GPT-4, which reportedly has 1.7 trillion parameters. Many artificial intelligence researchers believe bigger models with more parameters and data could unlock new capabilities. 🔥 NVIDIA didn’t provide a cost for the new GB200 or the systems it’s used in. However the Hopper-based H100 chip costs between $25,000 and $40,000 per chip, with whole systems that cost as much as $200,000, according to analyst estimates. 🤑 #ai #nvidia #gtc #gpt #gpt4 #aws #gpu #cloudcomputing
To view or add a comment, sign in
-
In the dynamic world of artificial intelligence (AI) and machine learning (ML), the quest for more robust and efficient computing solutions is relentless. NVIDIA, a trailblazer in this tech revolution, has recently lifted the curtain on its latest marvel in AI computing: the Blackwell AI chip. This pivotal innovation is poised to redefine the realms of possibility within AI, machine learning, and far beyond. Key highlights - Blackwell B200 GPU features a second-generation transformer engine supporting FP4 data type, doubling throughput for 4-bit precision AI models. 🧠 - New connection types, NVLink 5 and 800 Gb/s networking hardware, enhance GPU communication efficiency. ♻️ - HGX B200 server board connects eight B200 GPUs via NVLink, supporting x86-based generative AI platforms with networking speeds up to 400 GB/s. 🚀 - Blackwell B200 GPU is energy-efficient, reducing cost and consumption by up to 25 times compared to H100, capable of training a 1.8 trillion parameter model with just 2,000 GPUs using four megawatts of power. ⚡️ - Major cloud providers like Amazon, Google, Microsoft, and Oracle plan to offer Blackwell-powered instances, supported by Nvidia's AI Enterprise operating system for production-grade AI.x ☁️ You can read more about it on https://lnkd.in/d6FDYS78 You can watch the video on Nvidia to explore the full scope of what Blackwell offers to the future of computing and AI https://lnkd.in/dSPw-ii7
To view or add a comment, sign in
-
Accelsius has recently been featured in an article by Dylan Martin from CRN, discussing the rapid rise of the liquid cooling industry alongside "the frenzied development of generative AI offerings." READ HERE: https://lnkd.in/eCZCDy67 In the article, CEO Josh Claman underlines the need to think several generations ahead when planning for cooling. As CPU and GPU chips go from 200 to 400 to 1,000 to 2,000 watts per socket, Claman states, you don't want customers to come back and say, "You sold us this solution three years ago, and it can already not cool my latest server chips." Claman also mentioned that liquid cooling's opportunities don't end with AI; instead, it can also be used in traditional data centers to "lower carbon emissions, reduce energy allocated to cooling, and, as a result, save on operating costs." "After [customers] get used to liquid cooling and realize there's just a huge economic savings, then I think it's going to be widely adopted," Claman declared. Special thanks to Dylan Martin and CRN for featuring us in this article! And if you'd like to learn more about our NeuCool technology, and why it's designed to last for multiple generations of upcoming high-powered chips, be sure to check out our website: https://accelsius.com/
With AI Chips Getting Hotter, Liquid Cooling Vendors See Big Channel Play
crn.com
To view or add a comment, sign in
-
Nvidia’s AI chip dominance is being targeted by Google, Intel, and Arm The UXL Foundation project wants to eliminate the proprietary software barriers keeping developers locked into using Nvidia’s AI tech. Major tech companies are attempting to eliminate software advantages that have helped Nvidia dominate the artificial intelligence market. According to Reuters, a group formed by Intel, Google, Arm, Qualcomm, Samsung, and other tech companies is developing an open-source software suite that prevents AI developers from being locked into Nvidia’s proprietary tech, allowing their code to run on any machine and with any chip. The group, called The Unified Acceleration Foundation (UXL), told Reuters that technical details for the project should reach a “mature” state by the second half of this year, though a final release target wasn’t given. The project currently includes the OneAPI open standard Intel developed to eliminate requirements like specific coding languages, code bases, and other tools from tying developers into using specific architecture, such as Nvidia’s CUDA platform. Nvidia became the first chipmaker to hit a $2 trillion market capitalization last month, having experienced rapid growth after focusing on hardware for powering AI models, like its H100 and upcoming H200 GPUs. Those Nvidia chips, which lock developers into using Nvidia’s CUDA architecture, are superior to anything currently produced by other chipmakers, but the explosive demand has caused scarcity while rival companies continue developing their own alternatives. During the company’s 2023 Computex keynote, Nvidia CEO Jensen Huang said that four million developers were using the Cuda computing model. While UXL says the project will initially aim to open up options for AI apps and high-performance computing applications, the group plans to eventually support Nvidia’s hardware and code, too. UXL is seeking aid from additional chipmakers and cloud-computing companies like Microsoft and Amazon to ensure the solution can be deployed on any chip or hardware. Microsoft, which is notably not included in the UXL coalition, was rumored to have teamed up with AMD last year to develop alternative AI chips that could challenge Nvidia’s effective monopoly over the industry. theverge.com
To view or add a comment, sign in
-
Amazon puts it's #trainium2 chips into #AWS https://lnkd.in/gVQF56ja trainium2 are made in TSMC N5 5nm node similar to NVIDIA Grace Hopper H100 platform and have similar specs however given that Amazon won't have to pay "Nvidia tax" or the gross margins that Nvidia commands on H100 platform, the cost to AWS will be lower and these instances will likely be available a lot cheaper on performance per dollar basis Google Tensor cores are similar too Nvidia Blackwell H200 are coming in 2025 as well as AMD Mi335X trainium3 will also come in 2025 made in N3 3nm node of TSMC One problem with non-nvidia accelerators has been software. CuDA has been the main technology that gives nVidia the edge and hardly any accelerator designer has been able to match that Amazon is working with Anthropic in their design of trainium chip software and will likely have an edge over the other efforts as a result
Amazon reveals next-gen AI silicon, turns Trainium2 loose
theregister.com
To view or add a comment, sign in
-
NVIDIA's Blackwell chips have been delayed three more months due to design flaws found unusually late in production. Data centers who ordered the Blackwell line may have to wait until 2025. Learn more here. https://hubs.li/Q02NmZ6P0 #ZutaCore #liquidcooling #datacenters #ai #hpc #SustainableTech #HyperCool #AIComputing #sustainability #zeroemissions
Nvidia Reportedly Delays Launch of Next-Gen AI Chips Amid Design Flaw
hpcwire.com
To view or add a comment, sign in
-
The IBM P10 Chip Set is a game-changer. More processing Power, means fewer cores are needed. More processing Power means leveraging AI to its fullest potential, maximizing core capacity, thereby maximizing your data center and business economics. Meridian IT is your partner of choice for helping you design the ultimate Power on-prem, IBM Cloud, or hybrid environment. Get the best value and performance for your data center spend. Yes, I'm talking to all of your x86ers out there as well. S/4HANA rocks on IBM Power! Let's not forget about VMWare. Yikes, those are Powerful propositions and unbeatable combinations... #MeridianIT #IBMPower10 #IBMCloud
💫 Empowering enterprises with the trusted hybrid cloud & AI innovation that matters for their business and for the 🌎
To #GPU, or not to GPU. That is the question. But did you know there's an alternative approach to #ArtificialIntelligence inferencing that can bypass the need to offload data and the expensive accelerators that take a year to procure? IBM Cloud's #Power10 provides major performance gains at a fraction of the cost. While GPUs like NVIDIA's L4 and H100 are designed for this task, they come with overheads, such as price, energy, and throughput. On the other hand, the Power10 #MMA provides an on-chip accelerator specifically for the matrix multiply operation, large memory, and memory bandwidth. It also offers high parallelism via SMT, which improves the cost-performance envelope for inferencing operations. For traditional machine learning, GPUs do not provide the ability to accelerate the inference process, so most of these models are inferred on general-purpose platforms (CPUs). But the Power10 MMA can provide acceleration capabilities for specific ML models, along with larger memory, higher memory bandwidth, and higher parallelism, driving greater throughput and potentially improving latency in specific cases. Learn more about how IBM Cloud's Power10 chip can enhance the speed of AI inferencing: https://lnkd.in/exUXSDvk #AI #CloudIsPower
To view or add a comment, sign in
-
To #GPU, or not to GPU. That is the question. But did you know there's an alternative approach to #ArtificialIntelligence inferencing that can bypass the need to offload data and the expensive accelerators that take a year to procure? IBM Cloud's #Power10 provides major performance gains at a fraction of the cost. While GPUs like NVIDIA's L4 and H100 are designed for this task, they come with overheads, such as price, energy, and throughput. On the other hand, the Power10 #MMA provides an on-chip accelerator specifically for the matrix multiply operation, large memory, and memory bandwidth. It also offers high parallelism via SMT, which improves the cost-performance envelope for inferencing operations. For traditional machine learning, GPUs do not provide the ability to accelerate the inference process, so most of these models are inferred on general-purpose platforms (CPUs). But the Power10 MMA can provide acceleration capabilities for specific ML models, along with larger memory, higher memory bandwidth, and higher parallelism, driving greater throughput and potentially improving latency in specific cases. Learn more about how IBM Cloud's Power10 chip can enhance the speed of AI inferencing: https://lnkd.in/exUXSDvk #AI #CloudIsPower
To view or add a comment, sign in
Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer
2moOn a deeper level, this signifies a paradigm shift in AI hardware design. The NVLink-C2C interconnect truly dismantles the CPU-GPU barrier, enabling seamless data flow. What are your thoughts on how this architecture will impact the development of truly decentralized AI models?