Bert Verrycken’s Post

ASIC | hwaccelerators | let's connect

The magic is the speed of the chip to chip connection.

Menlo Ventures | Investing in AI and Infra!

NVIDIA's $7B Mellanox acquisition was actually one of tech's most strategic deals ever. The untold story of the most important company in AI that most people haven't heard of Most people think NVIDIA = GPUs. But modern AI training is actually a networking problem. A single A100 can only hold ~50B parameters. Training large models requires splitting them across hundreds of GPUs. Enter Mellanox. They pioneered RDMA (Remote Direct Memory Access) which lets GPUs directly access memory on other machines with almost no CPU overhead. Before RDMA, moving data between GPUs was a massive bottleneck. The secret sauce is in Mellanox's InfiniBand. While Ethernet does 200-400ns latency, InfiniBand does ~100ns. For distributed AI training where GPUs constantly sync gradients, this 2-3x latency difference is massive. Mellanox didn't just do hardware. Their GPUDirect RDMA software stack lets GPUs talk directly to network cards, bypassing CPU & system memory. This cuts latency another ~30% vs traditional networking stacks. NVIDIA's master stroke: Integrating Mellanox's ConnectX NICs directly into their DGX AI systems. The full stack - GPUs, NICs, switches, drivers - all optimized together. No one else can match this vertical integration. The numbers are staggering: - HDR InfiniBand: 200Gb/s per port - Quantum-2 switch: 400Gb/s per port - End-to-end latency: ~100ns - GPU memory bandwidth matching: ~900GB/s Why it matters: Training SOTA scale models requires: - 1000s of GPUs - Petabytes of data movement - Sub-millisecond latency requirements Without Mellanox tech, it would take literally months longer. The competition is playing catch-up: - Intel killed OmniPath - Broadcom/Ethernet still has higher latency - Cloud providers mostly stuck with RoCE NVIDIA owns the premium AI networking stack Looking ahead: CXL + Mellanox tech will enable even tighter GPU-NIC integration. We'll see dedicated AI networks with sub-50ns latency and Tb/s bandwidth. The networking advantage compounds. In the AI arms race, networking is the silent kingmaker. NVIDIA saw this early. The Mellanox deal wasn't about current revenue - it was about controlling the foundational tech for training next-gen AI. Next time you hear about a new large language model breakthrough, remember: The GPUs get the glory, but Mellanox's networking makes it possible. Sometimes the most important tech is invisible.

2 Comments

Axel Kloth

Founder & CEO at Abacus Semiconductor Corporation & Venture Partner at Pegasus Tech Ventures

Correct. Now take out the PCIe RC and device part to REALLY speed it up, and you have a winner. Abacus Semiconductor Corporation does exactly that (and some more...).

1 Reaction

Stevie Ray Allen

President Americas

Mostly agree. Nvidia made a wise acquisition. They were probably spared an unwise acquisition when the Softbank Arm deal was de-railed. It would have defocused the company on integration while the world of Arm customers would be rioting! I put the Mellanox deal right up there with Marvell/Galileo. A bunch of smart Israeli engineers (being practical minded) integrate fairly easily with Silicon Valley companies.

See more comments

To view or add a comment, sign in

More Relevant Posts

Neil Trevains

Solutions Architect driving Cisco's Multi-cloud Architecture and Intent-based Data Center offerings in the UK
3w
Report this post
We often take our enterprise’s data center networks for granted, consigned to the realm of ‘plumbing’. As the application estate that drives our businesses has become increasingly complex, interdependent, and distributed - and will continue to do so - the network is absolutely critical, along with how we secure it and how we understand it and react to changing demands. AI/ML brings this into sharp focus, whether it is Infiband or Ethernet. We need to transform our view of this most critical of components.
Deedy Das

Menlo Ventures | Investing in AI and Infra!
3w

NVIDIA's $7B Mellanox acquisition was actually one of tech's most strategic deals ever. The untold story of the most important company in AI that most people haven't heard of Most people think NVIDIA = GPUs. But modern AI training is actually a networking problem. A single A100 can only hold ~50B parameters. Training large models requires splitting them across hundreds of GPUs. Enter Mellanox. They pioneered RDMA (Remote Direct Memory Access) which lets GPUs directly access memory on other machines with almost no CPU overhead. Before RDMA, moving data between GPUs was a massive bottleneck. The secret sauce is in Mellanox's InfiniBand. While Ethernet does 200-400ns latency, InfiniBand does ~100ns. For distributed AI training where GPUs constantly sync gradients, this 2-3x latency difference is massive. Mellanox didn't just do hardware. Their GPUDirect RDMA software stack lets GPUs talk directly to network cards, bypassing CPU & system memory. This cuts latency another ~30% vs traditional networking stacks. NVIDIA's master stroke: Integrating Mellanox's ConnectX NICs directly into their DGX AI systems. The full stack - GPUs, NICs, switches, drivers - all optimized together. No one else can match this vertical integration. The numbers are staggering: - HDR InfiniBand: 200Gb/s per port - Quantum-2 switch: 400Gb/s per port - End-to-end latency: ~100ns - GPU memory bandwidth matching: ~900GB/s Why it matters: Training SOTA scale models requires: - 1000s of GPUs - Petabytes of data movement - Sub-millisecond latency requirements Without Mellanox tech, it would take literally months longer. The competition is playing catch-up: - Intel killed OmniPath - Broadcom/Ethernet still has higher latency - Cloud providers mostly stuck with RoCE NVIDIA owns the premium AI networking stack Looking ahead: CXL + Mellanox tech will enable even tighter GPU-NIC integration. We'll see dedicated AI networks with sub-50ns latency and Tb/s bandwidth. The networking advantage compounds. In the AI arms race, networking is the silent kingmaker. NVIDIA saw this early. The Mellanox deal wasn't about current revenue - it was about controlling the foundational tech for training next-gen AI. Next time you hear about a new large language model breakthrough, remember: The GPUs get the glory, but Mellanox's networking makes it possible. Sometimes the most important tech is invisible.
Like Comment
To view or add a comment, sign in
Prof. Willie LU

32yrs in ICT techs, 19yrs in IP management, 22yrs in int'l relations & global business
3w
Report this post
Training large models requires splitting them across hundreds or thousands of GPUs in geographically distributed locations across Wide Area Networking (WAN) infrastructure. All existing networking protocols does NOT support AI data trainings amongst geographically distributed datacenters. This Mellanox solutions also does NOT support WAN networking's training models too. That is why we launched the [Task Force for AI-Data Networking-Protocol (TF-AID-NP)] with Working Group for National AI-Data Training and Inference super-Pool Infrastructure. We need new protocol for AI-data networking infrastructure as well, especially for building the National AI-Data Training and Inference super-Pool Infrastructure (NAID-TIPI) amongst hundreds of large-scale datacenters operated by different vendors and providers nationwide. Deedy Das
Deedy Das

Menlo Ventures | Investing in AI and Infra!
3w

NVIDIA's $7B Mellanox acquisition was actually one of tech's most strategic deals ever. The untold story of the most important company in AI that most people haven't heard of Most people think NVIDIA = GPUs. But modern AI training is actually a networking problem. A single A100 can only hold ~50B parameters. Training large models requires splitting them across hundreds of GPUs. Enter Mellanox. They pioneered RDMA (Remote Direct Memory Access) which lets GPUs directly access memory on other machines with almost no CPU overhead. Before RDMA, moving data between GPUs was a massive bottleneck. The secret sauce is in Mellanox's InfiniBand. While Ethernet does 200-400ns latency, InfiniBand does ~100ns. For distributed AI training where GPUs constantly sync gradients, this 2-3x latency difference is massive. Mellanox didn't just do hardware. Their GPUDirect RDMA software stack lets GPUs talk directly to network cards, bypassing CPU & system memory. This cuts latency another ~30% vs traditional networking stacks. NVIDIA's master stroke: Integrating Mellanox's ConnectX NICs directly into their DGX AI systems. The full stack - GPUs, NICs, switches, drivers - all optimized together. No one else can match this vertical integration. The numbers are staggering: - HDR InfiniBand: 200Gb/s per port - Quantum-2 switch: 400Gb/s per port - End-to-end latency: ~100ns - GPU memory bandwidth matching: ~900GB/s Why it matters: Training SOTA scale models requires: - 1000s of GPUs - Petabytes of data movement - Sub-millisecond latency requirements Without Mellanox tech, it would take literally months longer. The competition is playing catch-up: - Intel killed OmniPath - Broadcom/Ethernet still has higher latency - Cloud providers mostly stuck with RoCE NVIDIA owns the premium AI networking stack Looking ahead: CXL + Mellanox tech will enable even tighter GPU-NIC integration. We'll see dedicated AI networks with sub-50ns latency and Tb/s bandwidth. The networking advantage compounds. In the AI arms race, networking is the silent kingmaker. NVIDIA saw this early. The Mellanox deal wasn't about current revenue - it was about controlling the foundational tech for training next-gen AI. Next time you hear about a new large language model breakthrough, remember: The GPUs get the glory, but Mellanox's networking makes it possible. Sometimes the most important tech is invisible.
1 Comment
Like Comment
To view or add a comment, sign in
Amer Ather

Cloud and Machine Learning Performance Engineering
3w
Report this post
Training large language models demands not only immense raw TFLOPs GPU compute power but also high-bandwidth, low-latency communication both within and across GPUs. This involves leveraging GPU-to-GPU communication technologies such as NVLink and RDMA-based InfiniBand interconnects to ensure efficient scaling across thousands of GPU nodes, minimizing communication bottlenecks to achieve massive scale distributed training.
Deedy Das

Menlo Ventures | Investing in AI and Infra!
3w

NVIDIA's $7B Mellanox acquisition was actually one of tech's most strategic deals ever. The untold story of the most important company in AI that most people haven't heard of Most people think NVIDIA = GPUs. But modern AI training is actually a networking problem. A single A100 can only hold ~50B parameters. Training large models requires splitting them across hundreds of GPUs. Enter Mellanox. They pioneered RDMA (Remote Direct Memory Access) which lets GPUs directly access memory on other machines with almost no CPU overhead. Before RDMA, moving data between GPUs was a massive bottleneck. The secret sauce is in Mellanox's InfiniBand. While Ethernet does 200-400ns latency, InfiniBand does ~100ns. For distributed AI training where GPUs constantly sync gradients, this 2-3x latency difference is massive. Mellanox didn't just do hardware. Their GPUDirect RDMA software stack lets GPUs talk directly to network cards, bypassing CPU & system memory. This cuts latency another ~30% vs traditional networking stacks. NVIDIA's master stroke: Integrating Mellanox's ConnectX NICs directly into their DGX AI systems. The full stack - GPUs, NICs, switches, drivers - all optimized together. No one else can match this vertical integration. The numbers are staggering: - HDR InfiniBand: 200Gb/s per port - Quantum-2 switch: 400Gb/s per port - End-to-end latency: ~100ns - GPU memory bandwidth matching: ~900GB/s Why it matters: Training SOTA scale models requires: - 1000s of GPUs - Petabytes of data movement - Sub-millisecond latency requirements Without Mellanox tech, it would take literally months longer. The competition is playing catch-up: - Intel killed OmniPath - Broadcom/Ethernet still has higher latency - Cloud providers mostly stuck with RoCE NVIDIA owns the premium AI networking stack Looking ahead: CXL + Mellanox tech will enable even tighter GPU-NIC integration. We'll see dedicated AI networks with sub-50ns latency and Tb/s bandwidth. The networking advantage compounds. In the AI arms race, networking is the silent kingmaker. NVIDIA saw this early. The Mellanox deal wasn't about current revenue - it was about controlling the foundational tech for training next-gen AI. Next time you hear about a new large language model breakthrough, remember: The GPUs get the glory, but Mellanox's networking makes it possible. Sometimes the most important tech is invisible.
2 Comments
Like Comment
To view or add a comment, sign in
Rohit Pandharkar

Partner, Consulting at EY | Generative AI | MIT Media Lab
3w
Report this post
During Jensen Huang 's recent India visit, he spoke of how the Mellanox acquisition actually helped GPUs talk to each other at a much faster speed. A more detailed description of what Mellanox really does and the secret sauce behind Nvidia GPU cluster performance in the post below.
Deedy Das

Menlo Ventures | Investing in AI and Infra!
3w

NVIDIA's $7B Mellanox acquisition was actually one of tech's most strategic deals ever. The untold story of the most important company in AI that most people haven't heard of Most people think NVIDIA = GPUs. But modern AI training is actually a networking problem. A single A100 can only hold ~50B parameters. Training large models requires splitting them across hundreds of GPUs. Enter Mellanox. They pioneered RDMA (Remote Direct Memory Access) which lets GPUs directly access memory on other machines with almost no CPU overhead. Before RDMA, moving data between GPUs was a massive bottleneck. The secret sauce is in Mellanox's InfiniBand. While Ethernet does 200-400ns latency, InfiniBand does ~100ns. For distributed AI training where GPUs constantly sync gradients, this 2-3x latency difference is massive. Mellanox didn't just do hardware. Their GPUDirect RDMA software stack lets GPUs talk directly to network cards, bypassing CPU & system memory. This cuts latency another ~30% vs traditional networking stacks. NVIDIA's master stroke: Integrating Mellanox's ConnectX NICs directly into their DGX AI systems. The full stack - GPUs, NICs, switches, drivers - all optimized together. No one else can match this vertical integration. The numbers are staggering: - HDR InfiniBand: 200Gb/s per port - Quantum-2 switch: 400Gb/s per port - End-to-end latency: ~100ns - GPU memory bandwidth matching: ~900GB/s Why it matters: Training SOTA scale models requires: - 1000s of GPUs - Petabytes of data movement - Sub-millisecond latency requirements Without Mellanox tech, it would take literally months longer. The competition is playing catch-up: - Intel killed OmniPath - Broadcom/Ethernet still has higher latency - Cloud providers mostly stuck with RoCE NVIDIA owns the premium AI networking stack Looking ahead: CXL + Mellanox tech will enable even tighter GPU-NIC integration. We'll see dedicated AI networks with sub-50ns latency and Tb/s bandwidth. The networking advantage compounds. In the AI arms race, networking is the silent kingmaker. NVIDIA saw this early. The Mellanox deal wasn't about current revenue - it was about controlling the foundational tech for training next-gen AI. Next time you hear about a new large language model breakthrough, remember: The GPUs get the glory, but Mellanox's networking makes it possible. Sometimes the most important tech is invisible.
Like Comment
To view or add a comment, sign in
Nicholas Wilt

Software architect and technologist.
3w
Report this post
Very on-point analysis of the Mellanox acquisition, though I'm not sure he mentioned how NVIDIA is moving compute into the network (for operations like AllReduce).
Deedy Das

Menlo Ventures | Investing in AI and Infra!
3w

NVIDIA's $7B Mellanox acquisition was actually one of tech's most strategic deals ever. The untold story of the most important company in AI that most people haven't heard of Most people think NVIDIA = GPUs. But modern AI training is actually a networking problem. A single A100 can only hold ~50B parameters. Training large models requires splitting them across hundreds of GPUs. Enter Mellanox. They pioneered RDMA (Remote Direct Memory Access) which lets GPUs directly access memory on other machines with almost no CPU overhead. Before RDMA, moving data between GPUs was a massive bottleneck. The secret sauce is in Mellanox's InfiniBand. While Ethernet does 200-400ns latency, InfiniBand does ~100ns. For distributed AI training where GPUs constantly sync gradients, this 2-3x latency difference is massive. Mellanox didn't just do hardware. Their GPUDirect RDMA software stack lets GPUs talk directly to network cards, bypassing CPU & system memory. This cuts latency another ~30% vs traditional networking stacks. NVIDIA's master stroke: Integrating Mellanox's ConnectX NICs directly into their DGX AI systems. The full stack - GPUs, NICs, switches, drivers - all optimized together. No one else can match this vertical integration. The numbers are staggering: - HDR InfiniBand: 200Gb/s per port - Quantum-2 switch: 400Gb/s per port - End-to-end latency: ~100ns - GPU memory bandwidth matching: ~900GB/s Why it matters: Training SOTA scale models requires: - 1000s of GPUs - Petabytes of data movement - Sub-millisecond latency requirements Without Mellanox tech, it would take literally months longer. The competition is playing catch-up: - Intel killed OmniPath - Broadcom/Ethernet still has higher latency - Cloud providers mostly stuck with RoCE NVIDIA owns the premium AI networking stack Looking ahead: CXL + Mellanox tech will enable even tighter GPU-NIC integration. We'll see dedicated AI networks with sub-50ns latency and Tb/s bandwidth. The networking advantage compounds. In the AI arms race, networking is the silent kingmaker. NVIDIA saw this early. The Mellanox deal wasn't about current revenue - it was about controlling the foundational tech for training next-gen AI. Next time you hear about a new large language model breakthrough, remember: The GPUs get the glory, but Mellanox's networking makes it possible. Sometimes the most important tech is invisible.
Like Comment
To view or add a comment, sign in
Touraj Parang

Technology Executive & Advisor | Serial Entrepreneur and Investor with $2B+ in Exits | Strategic Advisor to High Growth Startups & VC Funds | Amazon Bestselling Author of Exit Path
3w
Report this post
🔥 Some acquisitions have truly changed history. Some prime examples: NVIDIA <> Mellanox Technologies Facebook (Meta) <> Instagram Google <> YouTube Disney <> Pixar Animation Studios eBay <> PayPal 👉 Any others you think I should add to this list? 🔗 P.S. I explain why and how to get your startup on that strategic path in my book Exit Path: https://exitpath.net
Deedy Das

Menlo Ventures | Investing in AI and Infra!
3w

NVIDIA's $7B Mellanox acquisition was actually one of tech's most strategic deals ever. The untold story of the most important company in AI that most people haven't heard of Most people think NVIDIA = GPUs. But modern AI training is actually a networking problem. A single A100 can only hold ~50B parameters. Training large models requires splitting them across hundreds of GPUs. Enter Mellanox. They pioneered RDMA (Remote Direct Memory Access) which lets GPUs directly access memory on other machines with almost no CPU overhead. Before RDMA, moving data between GPUs was a massive bottleneck. The secret sauce is in Mellanox's InfiniBand. While Ethernet does 200-400ns latency, InfiniBand does ~100ns. For distributed AI training where GPUs constantly sync gradients, this 2-3x latency difference is massive. Mellanox didn't just do hardware. Their GPUDirect RDMA software stack lets GPUs talk directly to network cards, bypassing CPU & system memory. This cuts latency another ~30% vs traditional networking stacks. NVIDIA's master stroke: Integrating Mellanox's ConnectX NICs directly into their DGX AI systems. The full stack - GPUs, NICs, switches, drivers - all optimized together. No one else can match this vertical integration. The numbers are staggering: - HDR InfiniBand: 200Gb/s per port - Quantum-2 switch: 400Gb/s per port - End-to-end latency: ~100ns - GPU memory bandwidth matching: ~900GB/s Why it matters: Training SOTA scale models requires: - 1000s of GPUs - Petabytes of data movement - Sub-millisecond latency requirements Without Mellanox tech, it would take literally months longer. The competition is playing catch-up: - Intel killed OmniPath - Broadcom/Ethernet still has higher latency - Cloud providers mostly stuck with RoCE NVIDIA owns the premium AI networking stack Looking ahead: CXL + Mellanox tech will enable even tighter GPU-NIC integration. We'll see dedicated AI networks with sub-50ns latency and Tb/s bandwidth. The networking advantage compounds. In the AI arms race, networking is the silent kingmaker. NVIDIA saw this early. The Mellanox deal wasn't about current revenue - it was about controlling the foundational tech for training next-gen AI. Next time you hear about a new large language model breakthrough, remember: The GPUs get the glory, but Mellanox's networking makes it possible. Sometimes the most important tech is invisible.
7 Comments
Like Comment
To view or add a comment, sign in
Charlie Hou (Hiring)

Ai Enthusiastic | ex Mellanox/Nvidia Networking | ex Huawei
3w
Report this post
I completely agree with the approach for the 0-to-1 phase of AI infrastructure—it’s a winning strategy. In fact, I recognized this trend as early as 2018, when the SuperPod emerged as the optimal solution. However, as we move from the 1-to-100 phase, pricing will inevitably become a critical factor over the next five years, Historically, maintaining a 70%+ margin has not been sustainable over an extended period.
Deedy Das

Menlo Ventures | Investing in AI and Infra!
3w

NVIDIA's $7B Mellanox acquisition was actually one of tech's most strategic deals ever. The untold story of the most important company in AI that most people haven't heard of Most people think NVIDIA = GPUs. But modern AI training is actually a networking problem. A single A100 can only hold ~50B parameters. Training large models requires splitting them across hundreds of GPUs. Enter Mellanox. They pioneered RDMA (Remote Direct Memory Access) which lets GPUs directly access memory on other machines with almost no CPU overhead. Before RDMA, moving data between GPUs was a massive bottleneck. The secret sauce is in Mellanox's InfiniBand. While Ethernet does 200-400ns latency, InfiniBand does ~100ns. For distributed AI training where GPUs constantly sync gradients, this 2-3x latency difference is massive. Mellanox didn't just do hardware. Their GPUDirect RDMA software stack lets GPUs talk directly to network cards, bypassing CPU & system memory. This cuts latency another ~30% vs traditional networking stacks. NVIDIA's master stroke: Integrating Mellanox's ConnectX NICs directly into their DGX AI systems. The full stack - GPUs, NICs, switches, drivers - all optimized together. No one else can match this vertical integration. The numbers are staggering: - HDR InfiniBand: 200Gb/s per port - Quantum-2 switch: 400Gb/s per port - End-to-end latency: ~100ns - GPU memory bandwidth matching: ~900GB/s Why it matters: Training SOTA scale models requires: - 1000s of GPUs - Petabytes of data movement - Sub-millisecond latency requirements Without Mellanox tech, it would take literally months longer. The competition is playing catch-up: - Intel killed OmniPath - Broadcom/Ethernet still has higher latency - Cloud providers mostly stuck with RoCE NVIDIA owns the premium AI networking stack Looking ahead: CXL + Mellanox tech will enable even tighter GPU-NIC integration. We'll see dedicated AI networks with sub-50ns latency and Tb/s bandwidth. The networking advantage compounds. In the AI arms race, networking is the silent kingmaker. NVIDIA saw this early. The Mellanox deal wasn't about current revenue - it was about controlling the foundational tech for training next-gen AI. Next time you hear about a new large language model breakthrough, remember: The GPUs get the glory, but Mellanox's networking makes it possible. Sometimes the most important tech is invisible.
Like Comment
To view or add a comment, sign in
AIPressRoom

188 followers
8mo
Report this post
#Topics AMD releases new chips to power faster AI training [ad_1] AMD wants people to remember that Nvidia’s not the only company selling AI chips. It’s announced the availability of new accelerators and processors geared toward running large language models, or LLMs. The chipmaker unveiled the Instinct MI300X accelerator and the Instinct M1300A accelerated processing unit (APU), which the company said works to train and run LLMs. The company said the MI300X has 1.5 times more memory capacity than the previous M1250X version. Both new products have better memory capacity and are more energy-efficient than their predecessors, said AMD. “LLMs continue to increase in size and complexity, requiring massive amounts of memory and compute,” AMD CEO Lisa Su said. “And we know the availability of GPUs is the single most important driver of AI adoption.”Su said during a presentation that MI300X “is the highest performing accelerator in the world.” She claimed MI300X is comparable to Nvidia’s H100 chips in training LLMs but performs better on the inference side — 1.4 times better than H100 when working with Meta’s Llama 2, a 70 billion parameter LLM. AMD partnered with Microsoft to put MI300X in its Azure virtual machines. Microsoft CTO Kevin Scott, a guest during Su’s speech, al...

AMD releases new chips to power faster AI training - AIPressRoom

https://aipressroom.com
Like Comment
To view or add a comment, sign in
AIPressRoom

188 followers
9mo
Report this post
#Topics AMD releases new chips to power faster AI training [ad_1] AMD wants people to remember that Nvidia’s not the only company selling AI chips. It’s announced the availability of new accelerators and processors geared toward running large language models, or LLMs. The chipmaker unveiled the Instinct MI300X accelerator and the Instinct M1300A accelerated processing unit (APU), which the company said works to train and run LLMs. The company said the MI300X has 1.5 times more memory capacity than the previous M1250X version. Both new products have better memory capacity and are more energy-efficient than their predecessors, said AMD. “LLMs continue to increase in size and complexity, requiring massive amounts of memory and compute,” AMD CEO Lisa Su said. “And we know the availability of GPUs is the single most important driver of AI adoption.”Su said during a presentation that MI300X “is the highest performing accelerator in the world.” She claimed MI300X is comparable to Nvidia’s H100 chips in training LLMs but performs better on the inference side — 1.4 times better than H100 when working with Meta’s Llama 2, a 70 billion parameter LLM. AMD partnered with Microsoft to put MI300X in its Azure virtual machines. Microsoft CTO Kevin Scott, a guest during Su’s speech, al...

AMD releases new chips to power faster AI training - AIPressRoom

https://aipressroom.com
Like Comment
To view or add a comment, sign in
Ting Chiang

Long Term IC (Integrated Circuits 芯片), Passive and Active Parts, Semiconductors, Electronic component Supplier (电子元器件供应商)
7mo Edited
Report this post
BROADCOM, the winner of open-line #AIchips Cloud service vendors and data center providers are building AI systems at an unprecedented speed. For them, the increase in consumer-level AI use cases has driven AI accelerators to become one of the most important hardware infrastructure resources in the new era. But when building these systems, they are also considering different options. NVIDIA’s proprietary AI ecosystem NVIDIA provides a complete set of AI infrastructure solutions. The basic computing power is based on GPU as the core, supplemented by Arm CPU, combined with the proprietary CUDA software ecosystem, plus the interconnection solution of Infiniband network and NVLink. For customers, they can directly choose NVIDIA's packaging solution and start building high-performance AI applications. Broadcom’s road to openness #Broadcom is a well-known #network and communications leader that provides customers with open Ethernet connection solutions, PCIe interconnect solutions, and co-packaged optical solutions. They also have a custom chip design business with considerable revenue. Broadcom demonstrated the new XPU chip they participated in the design at the AI Investor Conference held not long ago. As can be seen from the picture, in addition to the two large computing units in the middle, there are HBM memories on the left and right sides, with a total of 12 pieces. This has reached the current limit of TSMC's CoWoS-S package, and the capacity may exceed Nvidia's Blackwell. It can be seen that manufacturers using this chip are pursuing extremely high AI performance, and are likely to be large-scale cloud services such as Microsoft and Meta. Manufacturer. It is predicted that in Broadcom's #semiconductor business, revenue from AI will reach 35% in 2024, which not only includes the mass production of customized ASIC solutions for the two major customers, but also includes the mass production of related products by the new third-largest customer this year. Driven by demand, the revenue target of AI chips will reach 10 billion US dollars. One of the major customers is Google. The XPU jointly created by them and Broadcom is a TPU #chip that has been developed for the 5th generation. The third XPU design comes from another large consumer AI company. Why these customers choose to cooperate with Broadcom is that in addition to Broadcom's rich AI IP/packaging technology, it is also inseparable from the extremely short development cycle. From design and development to mass production is only 10 months. Finally, both Nvidia's GPU and Broadcom's XPU provide the market with more choices. For large cloud service vendors, they will take these two routes at the same time to provide customers with different server instances. In addition to cloud services, they will rely more on self-developed XPU in the future. Professional #ElectronicComponentsSupplier, contact me at my WhatsApp: +8618124138796
Like Comment
To view or add a comment, sign in

11,542 followers

View Profile Connect

Bert Verrycken’s Post

More from this author

Life is full of surprises: Rammstein meets Meta meets Gauss

I ❤️ to create an AI digital IC engineer (2/n)

I ❤️ to create an AI digital IC engineer (1/n)

Explore topics