Bert Verrycken’s Post

View profile for Bert Verrycken, graphic

ASIC | hwaccelerators | let's connect

The magic is the speed of the chip to chip connection.

View profile for Deedy Das, graphic

Menlo Ventures | Investing in AI and Infra!

NVIDIA's $7B Mellanox acquisition was actually one of tech's most strategic deals ever. The untold story of the most important company in AI that most people haven't heard of Most people think NVIDIA = GPUs. But modern AI training is actually a networking problem. A single A100 can only hold ~50B parameters. Training large models requires splitting them across hundreds of GPUs. Enter Mellanox. They pioneered RDMA (Remote Direct Memory Access) which lets GPUs directly access memory on other machines with almost no CPU overhead. Before RDMA, moving data between GPUs was a massive bottleneck. The secret sauce is in Mellanox's InfiniBand. While Ethernet does 200-400ns latency, InfiniBand does ~100ns. For distributed AI training where GPUs constantly sync gradients, this 2-3x latency difference is massive. Mellanox didn't just do hardware. Their GPUDirect RDMA software stack lets GPUs talk directly to network cards, bypassing CPU & system memory. This cuts latency another ~30% vs traditional networking stacks. NVIDIA's master stroke: Integrating Mellanox's ConnectX NICs directly into their DGX AI systems. The full stack - GPUs, NICs, switches, drivers - all optimized together. No one else can match this vertical integration. The numbers are staggering: - HDR InfiniBand: 200Gb/s per port - Quantum-2 switch: 400Gb/s per port - End-to-end latency: ~100ns - GPU memory bandwidth matching: ~900GB/s Why it matters: Training SOTA scale models requires: - 1000s of GPUs - Petabytes of data movement - Sub-millisecond latency requirements Without Mellanox tech, it would take literally months longer. The competition is playing catch-up: - Intel killed OmniPath - Broadcom/Ethernet still has higher latency - Cloud providers mostly stuck with RoCE NVIDIA owns the premium AI networking stack Looking ahead: CXL + Mellanox tech will enable even tighter GPU-NIC integration. We'll see dedicated AI networks with sub-50ns latency and Tb/s bandwidth. The networking advantage compounds. In the AI arms race, networking is the silent kingmaker. NVIDIA saw this early. The Mellanox deal wasn't about current revenue - it was about controlling the foundational tech for training next-gen AI. Next time you hear about a new large language model breakthrough, remember: The GPUs get the glory, but Mellanox's networking makes it possible. Sometimes the most important tech is invisible.

  • No alternative text description for this image
Axel Kloth

Founder & CEO at Abacus Semiconductor Corporation & Venture Partner at Pegasus Tech Ventures

3w

Correct. Now take out the PCIe RC and device part to REALLY speed it up, and you have a winner. Abacus Semiconductor Corporation does exactly that (and some more...).

Mostly agree. Nvidia made a wise acquisition. They were probably spared an unwise acquisition when the Softbank Arm deal was de-railed. It would have defocused the company on integration while the world of Arm customers would be rioting! I put the Mellanox deal right up there with Marvell/Galileo. A bunch of smart Israeli engineers (being practical minded) integrate fairly easily with Silicon Valley companies.

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics