ipopba - stock.adobe.com
Nvidia expands Vera Rubin platform, details Groq integration
Nvidia CEO Jensen Huang talks up efforts by the AI technology giant to pave the way for self-evolving, multi-agent systems with the integration of Groq LPUs and a software stack for the OpenClaw agent platform
Nvidia has outlined its next-generation hardware and software roadmap, declaring AI to be the essential infrastructure of the modern enterprise while detailing its plans to support the new era of agentic artificial intelligence (AI).
One of the key themes of this year’s event is what Nvidia calls the industry’s “fourth scaling law” – agentic scaling. With AI models increasingly expected to reason, use tools, and take autonomous actions, the focus is on enabling multi-agent systems to communicate with one another continuously.
That is changing the AI compute landscape from the focus on training to inferencing, Nvidia CEO Jensen Huang noted in his keynote address at the company’s GTC 2026 developer conference in San Jose this week.
“AI now has to think, and in order to think, it has to inference,” said Huang. “It’s way past training – the inference inflection point has arrived at a time when the amount of tokens and compute necessary has increased by roughly 10,000 times.”
To support the demands of multi-agent systems, Nvidia is building on a recent deal to license intellectual property from Groq, a US startup known for its chip architecture that performs inferencing tasks at breakneck speeds.
Ian Buck, the company’s vice-president of hyperscale and high-performance computing, said Nvidia has been working to pull forward the Groq roadmap and integrate the startup’s language processing unit (LPU) into its technology stack.
While GPUs excel at high-throughput processing, offering as much as 288GB of high-bandwidth memory per chip, LPUs have just 500MB of stacked SRAM. But what LPUs lack in memory, they make up for with extreme memory bandwidth of up to 250 terabytes per second and ultra-low latency token generation.
Real-time inference
By fusing the two technologies via a custom Spectrum-X-based interconnect, Nvidia aims to push the performance curve to unlock real-time inference for trillion-parameter models. The Nvidia Groq 3 LPX rack will support 256 LPUs, working in tandem with Rubin GPUs (graphics processing units) to process every token as a unified system.
Computer Weekly understands that this co-processing will work without requiring major changes to Nvidia’s Cuda software platform, with the LPU serving as a highly specialised decode accelerator for the GPU and by leveraging a new operating system for AI factories called Dynamo to perform disaggregated inference.
“We offload parts of the computation for every token to the LPU, primarily FFN [feed-forward neural network] layers, to take advantage of the fast, high bandwidth that the LPU has to offer,” Buck explained. Meanwhile, the complex “attention math” and the rest of the model are still run on the GPU.
This hybrid architecture allows the Vera Rubin platform to serve premium, trillion-parameter models at speeds exceeding 500 tokens per second, delivering a 35-fold increase in throughput and expanding the revenue opportunity for AI inference providers, according to Nvidia.
But because agents don’t operate on GPUs alone – relying heavily on CPUs for tool calling, code compilation, and database queries – Nvidia has also introduced a new Vera CPU rack. Engineered with new Olympus cores, a single liquid-cooled rack of 256 Vera CPUs promises to double the performance of agentic workloads compared to previous generations.
Additionally, Nvidia announced the Bluefield-4 STX, an AI-native storage reference architecture designed to manage the massive working memory contexts that long-running AI agents require.
The rise of OpenClaw
Not surprisingly on the software front, the focus was also on agentic AI with Nvidia’s backing of OpenClaw, an open-source orchestration framework for long-running, self-evolving AI agents that the community refers to as claws.
Karri Briski, Nvidia’s vice-president of generative AI software for enterprise, went as far as describing OpenClaw as “likely the single most important software release in history”.
Instead of relying on a single large language model (LLM), claws can spin up sub-agents, delegate specialised skills, access local file systems, and execute complex workflows to achieve overarching objectives.
“We used to prompt with ‘what’, ‘how’ or ‘why’, but for claws, we now prompt with ‘build’, ‘create’ or ‘make’,” said Briski. “Claws are the new application layer for AI.”
Recognising the security risks of unchecked, self-evolving agents accessing sensitive enterprise data or escalating privileges, Nvidia has also introduced the open-source Nvidia Agent Toolkit.
A key component of the toolkit is Nvidia NemoClaw, built with OpenClaw’s Austrian creator Peter Steinberger. NemoClaw simplifies deployments by installing the OpenClaw framework, Nvidia’s NemoTron AI models, and the new OpenShell runtime in a single command.
OpenShell acts as a secure sandbox, enforcing policy-based security, network, and privacy guardrails so that always-on AI agents can operate safely without compromising their host systems.
To serve as the “brain” for these agents, Nvidia recently debuted the NemoTron 3 Super model, which has clinched the top spot for open models on PinchBench, an open-source benchmark that evaluates how effectively different LLMs perform real-world tasks within the OpenClaw ecosystem, such as writing code and conducting research.
Huang claimed that OpenClaw will completely upend the enterprise software market, with every software-as-a-service (SaaS) provider set to become agent-as-a-service companies. “Every company in the world today needs to have an OpenClaw strategy, just as they did for Linux and Kubernetes,” he said.
Physical AI: from telco edge to outer space
Meanwhile, Nvidia is doubling down on physical AI, pushing accelerated computing into industrial manufacturing, telecommunications, autonomous vehicles, and robotics.
“The IT industry is only $2tn large,” said Rev Lebaredian, Nvidia’s vice-president of Omniverse and simulation technology. “The rest of the world's industries need physical AI and AI physics models that can understand, model, and interact with the real world.”
At GTC 2026, Nvidia also announced integrations with industrial software makers such as Siemens, Cadence, and Synopsys, allowing manufacturers to cut engineering development cycles. In telecommunications, Nvidia is partnering with Nokia and T-Mobile to deploy AI applications directly onto 5G network edges, processing thousands of camera streams for smart city and public safety use cases with deterministic latency.
In the autonomous vehicle (AV) sector, Nvidia announced its new Halos OS software safety foundation and that Uber will operate a robotaxi network entirely powered by Nvidia’s full-stack Drive AV software. Pilots will begin in Los Angeles and the San Francisco Bay Area next year, scaling globally by the end of 2028.
Looking beyond Earth, Nvidia teased its Vera Rubin Space Modules. The company is working with Axiom Space, Planet Labs, and Aetherflux on the use of the space-optimised AI computing modules to turn orbital datacentres and spacecraft into robotic systems capable of real-time sensing and decision-making.
Read more about AI infrastructure
- Singtel and Nvidia have teamed up on a multimillion-dollar facility to help organisations scale enterprise AI deployments, tackle extreme datacentre power densities, and prepare for the era of embodied AI.
- AI workloads can put considerable strain on technology infrastructure. Here are some of the key considerations when deploying AI infrastructure to harness its full potential.
- Neocloud and sovereign cloud providers offer alternatives to hyperscalers for AI infrastructure and data sovereignty, but availability gaps and a lack of managed AI services can pose challenges to enterprise customers.
- AMD’s head of AI software discusses the company’s plans to make its ROCm platform ubiquitous, and how it is leveraging open source to democratise access to AI capabilities.
