Klavdiya Krinichnaya - stock.ado

Nvidia unveils Vera Rubin architecture to power AI agents

The AI chip giant has taken the wraps off its latest compute platform designed for test-time scaling and reasoning models, alongside a slew of open source models for robotics and autonomous driving

Nvidia has unveiled its Vera Rubin compute platform with an architecture designed to power agentic artificial intelligence (AI) systems that think and reason rather than simply retrieve information.

The announcement marks a move by Nvidia to address the exponential rise in AI compute requirements posed by the three laws of scaling: model pre-training, post-training, and test-time scaling, where AI models generate better results by spending more compute cycles thinking during the inference stage.

Speaking at a virtual media briefing ahead of CES 2026, Dion Harris, Nvidia’s senior director of high-performance computing and AI hyperscale infrastructure, detailed the Vera Rubin NVL72, a fully liquid-cooled rack-scale system that integrates six distinct chips, including the new Vera central processing unit (CPU) and Rubin graphics processing unit (GPU).

“Over the last year, we’ve seen an incredible leap in the intelligence of language models,” said Harris. “Top models like Kimi K2 Thinking employ reasoning during inference, generating more tokens for better answers. This increase in tokens requires an increase in compute.”

The Vera Rubin platform succeeds the current-generation Blackwell architecture, boasting performance leaps. The Rubin GPU features high-bandwidth memory with bandwidths of up to 22 terabytes per second and a third-generation transformer engine.

Compared with Blackwell, the Rubin GPU is five times faster for inferencing tasks and 3.5 times faster in crunching training workloads, according to Nvidia. The system is built to handle mixture-of-experts (MoE) models, which require massive all-to-all communication between GPUs.

“Rubin provides the performance necessary for the most demanding MoE models,” said Harris. “With the Vera Rubin architecture, we’re helping our partners and customers build the world’s largest, most advanced AI systems at the lowest cost.”

Read more about AI in APAC

On the CPU side, Harris said Vera is built for data movement and agentic processing with 88 custom Olympus Arm cores. “Vera doubles data processing, compression and code compilation performance versus our prior-generation Grace CPU across MoE training and inference,” he added.

A key technical hurdle being addressed by Vera Rubin is the management of KV cache, the context memory required for long-running AI interactions. As AI agents maintain state over time, GPU memory becomes a scarce resource.

To that end, Nvidia announced the inference context memory storage platform that creates a tier of memory specifically for inference. Placed between the GPU and traditional storage, it is powered by Nvidia’s BlueField-4 data processing unit (DPU) and Spectrum-X Ethernet networking.

“Compared to traditional network storage used in inference contexts, this platform delivers up to five times more tokens per second, five times better performance per TCO [total cost of ownership] dollar, and five times better power efficiency, which translates directly into higher throughput, lower latency and more predictable behaviour,” said Harris.

Nvidia confirmed that Vera Rubin-based products will be available from partners in the second half of 2026, with Microsoft Azure and CoreWeave among the first cloud service providers to deploy instances.

Open source and physical AI

Beyond hardware, Nvidia executives also talked up the company’s role as a software provider. Karri Briski, the company’s vice-president of generative AI software for enterprise, announced an expansion of Nvidia’s open source contributions, including new models for the Nemotron family, which now includes specific models for retrieval-augmented generation (RAG), content safety and speech, as well as the Cosmos world foundation models for creating synthetic training data.

Additionally, Nvidia has also developed Alpamayo, a family of open source vision-language-action (VLA) reasoning models targeted at the automotive industry. “Everything that moves will ultimately be fully autonomous, powered by physical AI,” said Ali Kani, vice-president of automotive at Nvidia. “Alpamayo is the first model available in the industry that allows autonomous vehicles to really think.”

Kani said that unlike previous perception-based systems, VLA reasoning models can break down complex edge cases in autonomous driving – such as a traffic light outage – into steps, going through every possibility to select the safest path.

The models can take in inputs like text, camera feeds and navigation history, and then output trajectories and reasoning traces, “so we can also tell passengers why the autonomous vehicle has taken an action”, he added.

Along with the launch of Alpamayo, Nvidia is releasing 1,700 hours of driving data that was used to train Alpamayo, making it the largest and most diverse publicly available autonomous vehicle dataset in the industry. The company has also released AlpaSim, an open source simulation framework that developers can use to evaluate VLA reasoning models and fine-tune Alpamayo with their data.

During the briefing, Briski clarified Nvidia’s commercial strategy regarding its open source tools. She said that while the company treats its open source models as products, it does not monetise them directly.

Instead, it drives revenue through the Nvidia AI Enterprise platform and Nvidia Inference Microservices, the containerised runtime software that allows enterprises to run these open source models securely and efficiently across cloud and on-premise environments.

Read more on Chips and processor hardware