What Developers Need to Know about Intel’s 2024 Server Chips

Intel's new chips are engineering marvels, but software developers will need to recompile and rewrite software to exploit their performance benefits.

Nov 10th, 2023 3:00am by Agam Shah

Featued image for: What Developers Need to Know about Intel’s 2024 Server Chips

Feature image via Intel.

Intel’s server chips currently coming off the factory floors are engineering marvels, with some of the most radical chip overhauls in decades.

The new chips, which will ship next year, are shape-shifters: They can run conventional applications such as databases and operating systems, but can switch over to AI applications, which are predicated on vector processors and arithmetic units.

Software developers will need to recompile and rewrite software to exploit the performance benefits.

The duality of those chips adds more choice for developers, but also creates confusion on which processor to choose for conventional applications (which use CPUs) and AI (which need accelerators). The on-chip AI features are a luxury; if developers can exploit it, it’s only gravy for Intel.

Granite Rapids and Sierra Forest

The upcoming server chips, code-named Granite Rapids and Sierra Forest, also bring mainstream support for new technologies like Compute Express Link (CXL) interconnect, which can convert a data center into one giant computer.

The new server chips will support existing x86 applications but may require some tweaks in code and recompilation. At the same time, the growing complexity of chips may further lock buyers into Intel hardware, which already dominates data centers.

Recompiling code should be a breeze if developers have a history of compiling code to x86 hardware, Intel’s Ronak Singhal, chief architect of Xeon, told The New Stack.

But recompiling should bring performance benefits by adapting code to the new features on the chips. “Specifically, when you’re comparing to something like an ARM competitor, you have to go and migrate all of that software to an ARM environment,” Singhal said.

Major cloud providers are changing data center designs for AI, with more network bandwidth and separate memory and storage tiers. The new server chips are designed for those installations, and that should narrow the number of programmers who code directly to resource allocation on hardware.

Some new technologies in Granite Rapids chips are designed to scale up performance in tandem across CPUs and accelerators. As a result, there are more fine-grained layers and complexity to the provisioning and sharing of accelerators, memory and storage.

CXL

The CXL technology gives servers faster access to more memory and storage with faster connections. Intel’s current server chips have early versions of CXL, but the technology will go mainstream with Granite Rapids, which is Intel’s server chip coming next year.

Typically, servers rely on internal memory, but CXL has superfast bandwidth that allows the creation of storage and memory tiers outside servers. That means a server can use memory stored in a box that may be physically distant.

Researchers are still trying to unpack the implications of CXL’s ability to use remote memory and cache, and how applications will perform.

Researchers from the University of Illinois at Urbana-Champaign published a paper in October benchmarking the performance of Nginx and Redis on CXL. The research also noted the promise of physical CXL memory by comparing it to emulated CXL memory systems.

Customers are looking at CXL memory for different reasons, which could include memory expansion, bandwidth or getting cheaper memory outside of standard DDR5, Singhal said.

Intel is building in CXL software that blurs the lines between direct-attached and tiers of attached memory. The software allows the hardware to manage under the covers and make sure that the right data is in the right place, Singhal said.

“Software doesn’t need to be aware of the different tiers — this notion of software-transparent memory management with CXL can be used in some cases,” Singhal said.

Granite Rapids will have more processing cores than the current generation chips, code-named Sapphire Rapids. It will be compiled of chiplets, in which server silicon by piecing together modular blocks of processors, memory and storage.

For developers, there are many things to consider as the new chips get closer to release.

The biggest change in Granite Rapids is the APX instruction set, which doubles the number of registers from 16 to 32. APX provides more registers to load and restore applications faster, and compilers can manage variables in registers and cut reliance on memory.

Coders will need to recompile applications to get incremental performance boosts, which matters in critical applications. Intel is working to mainstream APX-related tools in the open source ecosystem.

Intel pushed some APX developer tools to GCC over the last two months, and is doing the same for LLVM. Intel is building in APX support into its OneAPI parallel programming framework and is working with Microsoft and other companies for APX support.

Intel is also activating features for developers who need more time to fine-tune applications to the new chips.

The chip maker is slowly shifting to an on-demand purchase model in which customers will pay only for the chip features they need. For example, customers can pay a fee to turn on an on-chip accelerator like AMX — which is designed for AI acceleration — when it needs inferencing features.

Software development cycles are long, and a software stack may not initially be ready for AMX. Customers do not have to pay for AMX until the software has been tested.

Intel is also pairing its discrete Gaudi3 AI accelerator with the Granite Rapids chip, which is similar to how Nvidia paired its Grace CPU with its red-hot H100 GPU. Intel may ultimately take a traditional server-client approach to offload some of the AI inferencing on Gaudi to the AMX accelerator.

The chip maker’s OneAPI provides the tools to optimize code for its chips and has libraries for TensorFlow, PyTorch and its own distribution of Python. Intel is also a big advocate for SYCL, which slices proprietary code — such as CUDA — and rewrites programs to work on a wide range of CPUs and accelerators.

The security features on Granite Rapids remain a mystery, but Singhal said the chip will include Trusted Domain Execution (TDX), a confidential computing feature that only allows authorized users to access programs or code. The authentication feature keeps code secure.

Intel is also shipping a low-power chip called Sierra Forrest, which will come with 288 cores. The chip has power-efficient cores and will compete with low-power ARM-based chips developed by Ampere and Amazon. Sierra Forest will also have many of the features in Granite Rapids, including APX and AVX2, but is designed more for web applications.

The goal for Sierra Forest is to prevent x86 customers from defecting to ARM.

“Having this portfolio of Granite Rapids and Sierra Forest, our value proposition to them is saying ‘you want to build your infrastructure based on x86, whether it is your legacy software or new software… developing it to run on x86 going forward,” Singhal said.

Agam Shah has covered enterprise IT for more than a decade. Outside of machine learning, hardware and chips, he's also interested in martial arts and Russia.