peshkov - stock.adobe.com
Microsoft introduces AI accelerator for US Azure customers
The company has developed Maia 200, an AI accelerator that promises to boost inference workloads
Microsoft has announced that Azure’s US central datacentre region is the first to receive a new artificial intelligence (AI) inference accelerator, Maia 200.
Microsoft describes Maia 200 as an inference powerhouse, built on TSMC 3nm process with native FP8/FP4 (floating point) tensor cores, a redesigned memory system that uses 21 6GB of the latest high-speed memory architecture (HBM3e). This is capable of transferring data at 7TB per second. Maia also provides 272MB of on-chip memory plus data movement engines, which Microsoft said is used to keep massive models fed, fast and highly utilised.
According to the company, these hardware features mean Maia 200 is capable of delivering three times the FP4 performance of the third generation Amazon Trainium, and FP8 performance above Google’s seventh-generation tensor processing unit. Microsoft said Maia 200 represents its most efficient inference system yet, offering 30% better cost performance over existing systems, but at the time of writing, it was unable to give a date as to when the product would be available outside of the US.
Along with its US Central datacentre region, Microsoft also announced that its US West 3 datacentre region near Phoenix, Arizona will be the next to be updated with Maia 200.
In a blog post describing how Maia 200 is being deployed, Scott Guthrie, Microsoft executive vice-president for cloud and AI, said the setup comprises racks of trays configured with four Maia accelerators. Each tray is fully connected with direct, non‑switched links, to keep high‑bandwidth communication local for optimal inference efficiency.
He said the same communication protocol is used for intra-rack and inter-rack networking using the Maia AI transport protocol to provide a way to scale clusters of Maia 200 accelerators with minimal network hops.
“This unified fabric simplifies programming, improves workload flexibility and reduces stranded capacity while maintaining consistent performance and cost efficiency at cloud scale,” added Guthrie.
Read more AI accelerator stories
- AI chip challenger Groq eyes APAC expansion: Groq’s novel chip architecture that speeds up AI inferencing has attracted a fast-growing developer base as it plans its first datacentre in the Asia-Pacific region.
- UK chip strategy needs an AI acceleration slant: Analysis for the government shows gaps in Labour’s AI plan of action, but the big opportunity is in optoelectronics.
Guthrie said Maia 200 introduces a new type of two-tier scale-up design built on standard ethernet. “A custom transport layer and tightly integrated NIC [network interface card] unlocks performance, strong reliability and significant cost advantages without relying on proprietary fabrics,” he added.
In practice, this means each accelerator offers up to 1.4TB per second of dedicated scale-up bandwidth and, according to Guthrie, enables Microsoft to provide predictable, high-performance collective operations across clusters of up to 6,144 accelerators.
What this all means, at least from Guthrie’s perspective, is that the Maia 200 architecture is capable of delivering scalable performance for dense inference clusters while reducing power usage and overall total cost of ownership across Azure’s global fleet of datacentres.
On the software side, he said a sophisticated simulation pipeline was used to guide the Maia 200 architecture from its earliest stages. The pipeline involved modelling the computation and communication patterns of large language models with high fidelity.
“This early co-development environment enabled us to optimise silicon, networking and system software as a unified whole – long before first silicon,” said Guthrie, adding that Microsoft also developed a significant emulation environment, which was used from low-level kernel validation all the way to full model execution and performance tuning.
As part of the roll-out, the company is offering AI developers a preview of the Maia 200 software developer’s kit.
