archy13 - stock.adobe.com
HPE expands AMD collaboration to advance open rack-scale AI
IT and networking provider to adopt open, full-stack AI platform engineered for large-scale AI workloads looking to deliver high-bandwidth, low-latency connectivity across massive AI clusters
Looking to no less than “redefine what is possible in high-performance computing”, processor giant AMD has announced an expanded collaboration with HPE to accelerate the development of what the firms say will be the next generation of open, scalable artificial intelligence (AI) infrastructure built on AMD leadership compute technologies.
The principal part of the partnership will see HPE become one of the first system providers to adopt the AMD Helios rack-scale AI architecture, which will integrate a purpose-built HPE Juniper Networking scale-up switch – in collaboration with Broadcom – and software for seamless, high-bandwidth connectivity over Ethernet.
Helios combines AMD EPYC central processing units (CPUs), AMD Instinct graphics processing units (GPUs), AMD Pensando advanced networking and the AMD ROCm open software stack to deliver what is being described as a cohesive platform optimised for performance, efficiency and scalability. AMD says the system is engineered to simplify deployment of large-scale AI clusters, enabling faster time to solution and greater infrastructure flexibility across research, cloud and enterprise environments.
Built on the OCP Open Rack Wide design, Helios is designed to help customers and partners streamline deployment timelines and deliver a scalable, flexible offering for demanding AI workloads. The Helios rack-scale AI platform delivers up to 2.9 exaFLOPS of FP4 performance per rack using AMD Instinct MI455X GPUs, AMD EPYC Venice CPUs and AMD Pensando Vulcano network interface cards for scale-out networking. This is all unified through the open ROCm software ecosystem that the company claims will enable flexibility and innovation across AI and HPC workloads.
“HPE has been an exceptional long-term partner to AMD,” said AMD chair and CEO Lisa Su. “With Helios, we’re taking that collaboration further, bringing together the full stack of AMD compute technologies and HPE’s system innovation to deliver an open, rack-scale AI platform that drives new levels of efficiency, scalability and breakthrough performance for our customers in the AI era.”
HPE says the partnership has enabled it to integrate differentiated technologies for its customers, specifically a scale-up Ethernet switch and software designed for Helios. Developed in collaboration with Broadcom, the switch delivers optimised performance for AI workloads using the ultra accelerator link over Ethernet (UALoE) standard, reinforcing the AMD commitment to open, standards-based technologies.
HPE president and CEO Antonio Neri said: “For more than a decade, HPE and AMD have pushed the boundaries of supercomputing, delivering multiple exascale-class systems and championing open standards that accelerate innovation. With the introduction of the new AMD ‘Helios’ and our purpose-built HPE scale-up networking solution, we are providing our cloud service provider customers with faster deployments, greater flexibility and reduced risk in how they scale AI computing in their businesses.”
Read more about networking for AI
- AI-ready companies turning network pilots into profit: Research finds ‘pacesetter’ companies significantly more likely to move network AI pilots into production, and 50% more likely to report measurable value from AI.
- Enterprises believe networking will make or break AI adoption: Research reveals more than 40% of enterprises in advanced stages of GenAI adoption plan to integrate artificial intelligence into 20-30 applications, further raising the stakes for modernised networks.
- AI, streaming to deliver ‘network crunch’ by 2030: Research from carrier routing software provider finds that despite significant investment in AI, operators admit they can’t fully optimise networks without access to more real-time data and network modernisation.
- Cisco beefs up secure AI enterprise network architecture: IT and networking giant builds on enterprise network architecture with systems designed to simplify operations across campus and branch deployments such as network configuration.
HPE will offer the AMD Helios AI Rack-Scale Architecture worldwide in 2026.
It also revealed that Herder, a new supercomputer for the High-Performance Computing Centre Stuttgart (HLRS) in Germany, is powered by AMD Instinct MI430X GPUs and next-generation AMD EPYC Venice CPUs.
Built on the HPE Cray Supercomputing GX5000 platform, Herder is designed to offer performance and efficiency for HPC and AI workloads at scale.
HPE and AMD believe that the combination of their respective compute portfolios and system design will create a powerful tool for sovereign scientific discovery and industrial innovation for European researchers and enterprises. Delivery of Herder is scheduled for the second half of 2027 and it is expected to go into service by the end of 2027.
“Our scientific user community requires that we continue to support traditional applications of HPC for numerical simulation,” said Michael Resch, director of HLRS. “At the same time, we are seeing growing interest in machine learning and artificial intelligence. Herder’s system architecture will enable us to support both of these approaches, while also giving our users the ability to develop and benefit from new kinds of hybrid HPC/AI workflows.
“This platform will not only make it possible for our users to run larger, more powerful simulations that lead to exciting scientific discoveries, but also to develop more efficient computational methods that are only feasible with the capabilities that such next-generation hardware offers.”
