Photonics - Lumai: Scaling AI in the data centre power bottleneck

This is a guest post for the Computer Weekly Developer Network written by Phillip Burr, head of product at Lumai.

Lumai is an Oxford University spinout renowned for its 3D optical computing technology and its work to develop high-performance AI accelerators that use light beams to process data 50x faster than silicon GPUs.

Burr writes in full as follows…

For enterprise IT teams, AI is no longer just a research curiosity. It is rapidly becoming a core, foundational datacentre workload, running continuously to support internal copilots, customer-facing assistants and automated decision systems.

For many organisations, the bottleneck is no longer accelerators, but power, cooling and rack space. Scaling AI at this level forces a rethink of both hardware and infrastructure strategy.

Datacentre operators are encountering what engineers describe as the “power wall”, in other words, the energy required to run advanced AI models is increasing faster than the infrastructure designed to support them. However, the fundamental problem is that advances in semiconductor technology have slowed and it is no longer possible to increase performance without a significant rise in power consumption.

Inference as the dominant workload

To date, AI hardware has been focused on model training. However, as models move into production, inference is quickly becoming the dominant workload.

Enterprise applications now depend on AI systems capable of processing vast numbers of queries, often with increasingly large context windows. This creates sustained demand for compute, yet inference workloads behave differently from training workloads and this distinction has important implications for infrastructure design.

Inference phases

Inference pipelines typically consist of two stages: Prefill and Decode.

The Prefill stage processes the incoming prompt and generates the key-value cache used during generation. This step performs large volumes of dense matrix multiplication, the fundamental operation inside neural networks and is highly compute-intensive.
The Decode stage then generates tokens sequentially, referencing the cached information. Unlike prefill, decode is largely memory-bound, meaning performance depends more on memory bandwidth and latency than raw computational power.

At the Decode stage, much of a GPU’s computational capacity remains underutilised. At the same time, the Prefill stage is compute-bound, resulting in reduced throughput. It is precisely this compute-heavy phase where new hardware approaches can deliver the greatest efficiency gains.

Introducing optical AI acceleration

Burr: As the light passes through photonic components, the matrix operations are carried out naturally… and this low-energy process avoids the constant electrical switching used in digital processors.

Optical AI accelerators target the most demanding mathematical operations within AI models: large-scale matrix multiplication. Instead of performing these calculations through electronic switching, optical systems encode information onto light signals. As the light passes through photonic components, the matrix operations are carried out naturally… and because this process avoids the constant electrical switching used in digital processors, it requires far less energy.

Importantly, optical computing does not replace digital processors. The most practical approach is a hybrid optoelectronic architecture.

In this model, optical hardware acts as a specialised mathematical engine for large-scale linear algebra. Conventional silicon processors continue to manage control logic, memory operations, and non-linear functions such as activation layers.

This division of labour allows each technology to focus on the tasks it performs most efficiently. Optical systems manage dense matrix calculations, while digital processors provide the flexibility and programmability required for general computing.

Software integration

For enterprises adopting new hardware technologies, compatibility with existing software frameworks is essential.

Optical accelerators are designed to integrate with established AI ecosystems. Compilers automatically identify matrix-heavy sections of code and offload those operations to the optical engine, while the rest of the model runs in the digital domain.

Frameworks such as PyTorch or Kubernetes continue to operate as normal, ensuring developers do not need to rewrite models or management software to benefit from optical acceleration.

Infrastructure deployment

Equally important is how these systems fit within the datacentre.

Optical accelerators can connect through standard interfaces such as PCIe or CXL, allowing them to operate alongside CPUs and GPUs within conventional server architectures. This enables organisations to introduce optical computing within existing infrastructure.

For infrastructure teams, the most significant benefit is improved power efficiency per rack. Reduced heat generation allows greater inference capacity to be deployed within the same power envelope – an increasingly important factor as many facilities approach their power limits.

Looking beyond silicon

AI is rapidly becoming a core enterprise workload. As demand for inference continues to grow, the efficiency of the underlying compute infrastructure will determine how far organisations can scale.

Optical acceleration offers a pathway beyond the limits of traditional silicon by moving the most energy-intensive computations into the optical domain.

The next generation of AI infrastructure will likely not be purely electronic or purely optical. Instead, it will integrate optical and electronic processors to deliver higher AI performance per rack while staying within power and cooling limits.