Why Apache Kafka is the AI workflow you (probably) already have

This is a guest post for Computer Weekly Open Source Insider written by Anil Inamdar in his capacity as global head of data services at NetApp Instaclustr. The company provides a managed platform around open source data technologies. Inamdar has 20+ years of experience in data and analytics roles.

 Joining NetApp Instaclustr in 2019, he works with organisations to drive data-centric digital transformations via the right cultural, operational, architectural and technological roadmaps. 

Before Instaclustr, he held data & analytics leadership roles at Dell EMC, Accenture and Visa. Anil lives and works in the (San Francisco, not Bridgwater) Bay area.

Inamdar writes in full as follows…

Data streaming is the backbone of any successful AI system, dutifully connecting your applications with your models in real time. Yet enterprise tech leaders are being pitched, pretty much every day at this point, specialised AI workflow platforms that promise to “revolutionise” this streaming layer via purpose-built pipelines and intelligent orchestration. 

The reality is simpler: fully open source Apache Kafka (now in version 5.0) already provides the streaming backbone that most AI systems require, with performance and cost benefits that purpose-built platforms cannot match.

The fundamental challenge these platforms claim to solve (e.g. feeding AI static, outdated datasets) is exactly what Kafka was designed to prevent. Feeding AI models stale data is ingredient one in a recipe for failure, whether you’re building recommendation engines that need current user behavior or fraud detection systems that must respond to threats in real time. Kafka’s real-time streaming capabilities solve this problem at scale, handling the continuous data flows that keep AI systems accurate and responsive.

So before writing checks for “AI-native” streaming platforms, audit what your existing Kafka infrastructure already delivers. The performance gap favors the platform you likely already run.

Kafka delivers

AI workflows fundamentally depend on real-time data movement: ingesting training data streams, feeding live data to models for inference and distributing predictions back to applications. But strip away the (many layers of) AI marketing and these are the exact streaming patterns Kafka was built to optimise.

Kafka processes real-time data ingestion from databases, APIs, sensors and user interactions with latencies as low as 2 milliseconds. It handles real-time transformations and aggregations using Kafka Streams, eliminating preprocessing delays that kill the responsiveness of fraud detection and recommendation systems. Real-time updates ensure AI models work with current data, significantly reducing hallucinations that plague systems fed stale information.

These capabilities are capable of scaling to trillions of messages daily and petabytes of data across distributed clusters. Comparatively, purpose-built AI streaming platforms rarely publish comparable performance metrics because they typically cannot compete. Most run on messaging systems like Kafka anyway, adding abstraction layers that increase latency and reduce throughput.

AI platforms rebuild Kafka patterns poorly

Purpose-built AI workflow platforms promise streaming intelligence, but they essentially rebuild Kafka’s proven patterns with more constraints and less performance. They offer “AI-optimised” message routing that mirrors Kafka’s partitioning, or they provide “intelligent” data transformation that replicates Kafka Streams functionality, or they deliver “ML-ready” event processing that duplicates Kafka’s exactly-once delivery guarantees.

The difference is not in capability but in performance, flexibility and operational maturity. Kafka’s partitioning distributes AI workloads across brokers with precise control over data locality and consumer parallelism. Kafka Streams handles complex event processing with stateful transformations, windowing and joins that AI platforms struggle to match. Kafka’s replication and acknowledgement settings provide fault tolerance that has been proven across financial and mission-critical systems.

AI platforms abstract these controls behind simplified interfaces that work well in demos but break down when you need custom logic, specific performance characteristics, or integration with systems they do not directly support.

Real-world AI streaming demands Kafka

Modern AI applications require streaming patterns that expose the limitations of purpose-built platforms. Real-time recommendation engines need to ingest user interactions, update model features and deliver personalised results within milliseconds. Fraud detection systems must process transaction streams, enrich events with historical context and trigger alerts before fraudulent activity completes.

These use cases demand Kafka’s specific workflow strengths. Low-latency message delivery ensures AI models respond to events as they happen, while high-throughput ingestion handles the volume of data that modern applications generate. Durable storage provides the reliability that AI systems need for training data lineage and audit requirements and multiple data source connectivity allows AI applications to leverage diverse datasets from across the enterprise.

Purpose-built AI streaming platforms optimise for getting started quickly, but production AI systems need the performance, reliability and operational flexibility that only mature streaming infrastructure provides.

Optimise Kafka for AI workflows

Rather than replacing Kafka with an AI-specific platform, optimise your existing Kafka deployment for AI workloads using proven patterns. Here’s how to make that workflow hum.

Start by organising topics logically according to data type and use case. Separate user interactions, transaction logs and model predictions into distinct topics with systematic naming conventions. This approach reduces operational complexity and enables efficient parallel processing of different AI workloads across your organisation. Next, optimise partitioning strategies using relevant keys such as user ID or device ID to maintain data locality. This configuration preserves consistency for AI models that need to process related events together while enabling efficient parallel consumption across multiple AI services.

Implement efficient serialisation protocols using Avro or Protobuf instead of JSON for high-throughput AI tasks. These binary formats reduce message size and serialisation overhead significantly. The result is improved network utilisation and reduced processing latency for data-intensive AI workflows.

Configure proper replication factors and acknowledgment settings to ensure comprehensive fault tolerance. AI training pipelines and inference systems cannot afford data loss under any circumstances. Kafka’s durability guarantees provide the foundational reliability that AI operations require.

Finally, tune batching configurations to balance throughput and latency for your specific AI workloads. Adjust linger.ms and batch.size settings strategically based on whether your AI applications prioritise minimal latency for real-time inference or maximum throughput for batch training jobs.

Kafka scales with AI workflow ambitions

AI requirements evolve rapidly as organisations move from prototyping to production to advanced use cases. Kafka’s distributed architecture scales to meet these changing demands without requiring architectural rewrites or vendor migrations. You can add brokers to handle increased throughput as your AI workloads grow. You can increase partitions to support more parallel AI consumers as your systems become more sophisticated. You can implement tiered storage for long-term training data retention without impacting real-time performance requirements.

Purpose-built AI platforms that optimise for early use cases often cannot adapt to production complexity without expensive customisation or complete replacement. These platforms become workflow constraints rather than enablers as your AI ambitions grow. Open source Kafka provides the foundation for AI workflows that grow seamlessly from simple batch processing to complex real-time inference systems.

The streaming backbone of your AI infrastructure should be built on proven technology that delivers measurable performance. Specialised platforms promise convenience but deliver it at the cost of the capability and control that your AI systems will eventually require.