Nvidia Wins MLPerf AI Inference Benchmarks While Google Is a No-Show Again
On Wednesday, MLCommons revealed the latest data for its industry-standard AI benchmark, MLPerf Inference v6.0. Nvidia once again dominated, setting new performance throughput records.
Source: Nvidia
Nvidia topped its competition in artificial-intelligence workload performance, significantly beating even its own results from six months ago.
On Wednesday, MLCommons revealed the latest data for its industry-standard AI benchmark, MLPerf Inference v6.0. Nvidia once again dominated, setting new performance throughput records across models and scenarios. Inference is the process of generating answers from AI models.
Nvidia said the MLPerf numbers show the company’s “unmatched inference throughput across the broadest range of workloads, from massive LLMs to advanced vision language models, to generative recommender systems and more, on industry-standard benchmarks.”
Among the results, Nvidia won the critically important reasoning model DeepSeek-R1 category, setting new token throughput records for both offline and server scenarios.
R1 is also an example of a mixture-of-experts model. The usage of AI mixture-of-experts reasoning models is accelerating as they are significantly more accurate and provide higher-quality answers than prior models
Despite its reputation as a semiconductor company, the power of Nvidia’s software prowess is often underestimated. More than half of the company’s engineers work on software. Using new software optimizations, Nvidia increased Blackwell Ultra token throughput by up to 2.7 times over the past six months for the DeepSeek-R1 server scenario using the same hardware system, reducing the cost per token by more than 60%.
The continuously improving performance of the Blackwell Ultra GB300 NVL72 rack-scale system is a big reason why demand is outpacing Nvidia’s ability to manufacture its latest-generation GPU platforms.
Nvidia Director of Accelerated Computing Products Dave Salvator told Key Context that the company’s success stems from “extreme codesign” of the entire computing system, from networking to GPUs, combined with constantly improving software to give customers better performance even on older hardware. “Increasingly inference has become the prevalent workload. Performance increases have a pretty immediate economic implications for people deploying AI applications,” he said in an interview.
The speedups came through enhancements to software kernels, communication kernels, and multi-token prediction techniques, the executive said. These gains directly lead to lower costs and increased ROI for companies deploying AI applications.
MLPerf by MLCommons is regarded as the most credible “Gold Standard” for AI benchmarking due to its transparency and fair, peer-reviewed methodology. MLCommons is an open engineering consortium supported by over 130 members. MLPerf was founded in 2018. CoreWeave executive Shadi Saba said what is great about MLPerf is that results are verifiable and reproducible.
Five of the eleven tests in MLPerf Inference v6.0 are new or updated. “This is the most significant revision of the Inference benchmark suite that we’ve ever done,” MLPerf Inference Working Group Co-chair and Dell Engineer Frank Han said.
MLCommons team and members benchmarked AI accelerators from AMD, Nvidia, and Intel. Nvidia was the only platform to submit results for every AI model test.
When asked by Key Context why AMD has not submitted results for the most advanced model DeepSeek-R1, an AMD representative said they intend to submit results for DeepSeek in the future.
As Key Context reported in February, Google did not submit its current-generation AI chip, TPU v7 Ironwood, for MLPerf Inference v6.0. Google submitted its TPU Trillium (TPU v6e) for MLPerf Inference v5.0, with results published in April 2025, but the internet giant did not submit a TPU for v5.1, the results of which were published in September 2025.
Despite being a founding member of MLCommons, Google decided not to compete with Nvidia again. If Google wants to win the battle for AI chip supremacy and attract new customers for its TPU, showing up next time would be a good start.




