Benchmarks
Delivering open, useful measures of quality and performance to help guide responsible AI development
Benchmarks help balance the benefits and risks of AI through quantitative tools that guide responsible AI development. They provide neutral, consistent measurements of accuracy, speed, and efficiency which enable engineers to design reliable products and services, and help researchers gain new insights to drive the solutions of tomorrow.
About MLCommons Benchmarks
The foundation for MLCommons benchmark work was derived from and builds upon MLPerf which aims to deliver a representative benchmark suite for ML that fairly evaluates system performance to meet five high-level goals*:
- Enable fair comparison of competing systems while still encouraging ML innovation.
- Accelerate ML progress through fair and useful measurement.
- Enforce reproducibility to ensure reliable results.
- Serve both the commercial and research communities.
- Keep benchmarking effort affordable so all can participate.
Each benchmark suite is defined by a working group community of experts, who establish the fair benchmarks for AI systems. The working group defines the AI model to run, the data set against which it gets run, sets rules on what changes to the model are allowed, and measures how fast a given hardware runs the model. By working within this AI model tripod, MLCommons AI systems benchmarks measure not only the speed of hardware, but also the quality of training data, and quality metrics of an AI model itself.
*Perspective: Unlocking ML requires an ecosystem approach
MLCommons builds and measures the following benchmark suites:
AI Safety Benchmarks
The MLCommons AI Safety benchmarks aim to assess the safety of AI systems.
MLPerf Training
The MLPerf Training benchmark suite measures how fast systems can train models to a target quality metric.
MLPerf Inference: Mobile
The MLPerf Mobile benchmark suite measures how fast systems can process inputs and produce results using a trained model.
MLPerf Training: HPC
The MLPerf HPC benchmark suite measures how fast systems can train models to a target quality metric.
MLPerf Inference: Tiny
The MLPerf Tiny benchmark suite measures how fast systems can process inputs and produce results using a trained model.
MLPerf Inference: Datacenter
The MLPerf Inference: Datacenter benchmark suite measures how fast systems can process inputs and produce results using a trained model.
MLPerf Storage
The MLPerf Storage benchmark suite measures how fast storage systems can supply training data when a model is being trained.
MLPerf Inference: Edge
The MLPerf Edge benchmark suite measures how fast systems can process inputs and produce results using a trained model.
AlgoPerf: Training Algorithms Benchmark Results
The AlgoPerf: Training Algorithms benchmark measures how much faster we can train neural network models to a given target performance by changing the underlying training algorithm.
Submitting MLPerf Results
If you are interested in submitting MLPerf benchmark results, please join the appropriate working group. Registration deadlines are several weeks in advance of submission dates to ensure that all submitters are aware of benchmark requirements, and to ensure proper provision of all necessary resources.
- Enable fair comparison of competing systems while still encouraging ML innovation.
- Accelerate ML progress through fair and useful measurement.
- Enforce reproducibility to ensure reliable results.
- Serve both the commercial and research communities.
- Keep benchmarking effort affordable so all can participate.
Submitting to a MLPerf benchmark requires:
- A signed Contributor License Agreement (CLA) to enable contributing code, logs, etc. to MLCommons GitHub repositories.
- Membership in MLCommons OR a signed Non-member Test Agreement
- A signed trademark license agreement (either the member OR non-member version, as appropriate)
Membership is required for most benchmark working groups (e.g., Training, Inference, Mobile). There are some public benchmark working groups that have no access requirements where non-members may submit to the benchmark by first signing a Non-member Test Agreement.
We encourage people to become MLCommons Members if they wish to contribute to MLCommons projects. However, if you are interested in contributing to one of our open source projects and do not think your organization would be a good fit as a Member, please enter your GitHub ID into our subscription form. If your organization is already a Member of MLCommons, you can also use the subscription form to request authorization to commit code in accordance with the CLA.
MLPerf is a trademark of MLCommons. The use of the MLPerf results and trademark are described in our Policies.
Both member and non-member trademark agreements are available upon request by contacting [email protected].