Data centers need an upgraded dashboard to guide their journey to greater energy efficiency, one that shows progress running real-world applications.
The formula for energy efficiency is simple: work done divided by energy used. Applying it to data centers calls for unpacking some details.
Today’s most widely used gauge — power usage effectiveness (PUE) — compares the total energy a facility consumes to the amount its computing infrastructure uses. Over the last 17 years, PUE has driven the most efficient operators closer to an ideal where almost no energy is wasted on processes like power conversion and cooling.
Finding the Next Metrics
PUE served data centers well during the rise of cloud computing, and it will continue to be useful. But it’s insufficient in today’s generative AI era, when workloads and the systems running them have changed dramatically.
That’s because PUE doesn’t measure the useful output of a data center, only the energy that it consumes. That’d be like measuring the amount of gas an engine uses without noticing how far the car has gone.
Many standards exist for data center efficiency. A 2017 paper lists nearly three dozen of them, several focused on specific targets such as cooling, water use, security and cost.
Understanding What’s Watts
When it comes to energy efficiency, the computer industry has a long and somewhat unfortunate history of describing systems and the processors they use in terms of power, typically in watts. It’s a worthwhile metric, but many fail to realize that watts only measure input power at a point in time, not the actual energy computers use or how efficiently they use it.
So, when modern systems and processors report rising input power levels in watts, that doesn’t mean they’re less energy efficient. In fact, they’re often much more efficient in the amount of work they do with the amount of energy they use.
Modern data center metrics should focus on energy, what the engineering community knows as kilowatt-hours or joules. The key is how much useful work they do with this energy.
Reworking What We Call Work
Here again, the industry has a practice of measuring in abstract terms, like processor instructions or math calculations. So, MIPS (millions of instructions per second) and FLOPS (floating point operations per second) are widely quoted.
Only computer scientists care how many of these low-level jobs their system can handle. Users would prefer to know how much real work their systems put out, but defining useful work is somewhat subjective.
Data centers focused on AI may rely on the MLPerf benchmarks. Supercomputing centers tackling scientific research typically use additional measures of work. Commercial data centers focused on streaming media may want others.
The resulting suite of applications must be allowed to evolve over time to reflect the state of the art and the most relevant use cases. For example, the last MLPerf round added tests using two generative AI models that didn’t even exist five years ago.
A Gauge for Accelerated Computing
Ideally, any new benchmarks should measure advances in accelerated computing. This combination of parallel processing hardware, software and methods is running applications dramatically faster and more efficiently than CPUs across many modern workloads.
For example, on scientific applications, the Perlmutter supercomputer at the National Energy Research Scientific Computing Center demonstrated an average of 5x gains in energy efficiency using accelerated computing. That’s why it’s among the 39 of the top 50 supercomputers — including the No. 1 system — on the Green500 list that use NVIDIA GPUs.
Companies across many industries share similar results. For example, PayPal improved real-time fraud detection by 10% and lowered server energy consumption nearly 8x with accelerated computing.
The gains are growing with each new generation of GPU hardware and software.
In a recent report, Stanford University’s Human-Centered AI group estimated GPU performance “has increased roughly 7,000 times” since 2003, and price per performance is “5,600 times greater.”
Two Experts Weigh In
Experts see the need for a new energy-efficiency metric, too.
With today’s data centers achieving scores around 1.2 PUE, the metric “has run its course,” said Christian Belady, a data center engineer who had the original idea for PUE. “It improved data center efficiency when things were bad, but two decades later, they’re better, and we need to focus on other metrics more relevant to today’s problems.”
Looking forward, “the holy grail is a performance metric. You can’t compare different workloads directly, but if you segment by workloads, I think there is a better likelihood for success,” said Belady, who continues to work on initiatives driving data center sustainability.
Jonathan Koomey, a researcher and author on computer efficiency and sustainability, agreed.
“To make good decisions about efficiency, data center operators need a suite of benchmarks that measure the energy implications of today’s most widely used AI workloads,” said Koomey.
“Tokens per joule is a great example of what one element of such a suite might be,” Koomey added. “Companies will need to engage in open discussions, share information on the nuances of their own workloads and experiments, and agree to realistic test procedures to ensure these metrics accurately characterize energy use for hardware running real-world applications.”
“Finally, we need an open public forum to conduct this important work,” he said.
It Takes a Village
Thanks to metrics like PUE and rankings like the Green500, data centers and supercomputing centers have made enormous progress in energy efficiency.
More can and must be done to extend efficiency advances in the age of generative AI. Metrics of energy consumed doing useful work on today’s top applications can take supercomputing and data centers to a new level of energy efficiency.
To learn more about available energy-efficiency solutions, explore NVIDIA sustainable computing.