alunablue - stock.adobe.com
As CEO of Confluent, which offers enterprise-grade capabilities for the open source Kafka event streaming platform, Jay Kreps is as well-versed in the technology as his two co-founders, having co-developed Kafka while he was at LinkedIn in 2010.
The trio, which included Jun Rao and Neha Narkhede, wanted to solve the problem of getting data from different systems to act on in real time, for which there was no solution in the market at the time.
Today, Kafka is used by over 80% of Fortune 100 companies in multiple cases such as cyber security and fraud detection, becoming the de facto technology used by developers and architects to build real-time data streaming applications.
Despite its popularity, the data streaming space is still relatively new, Kreps told Computer Weekly in an interview on a recent visit to Singapore. During the interview, he also touched on some of the misconceptions around Kafka, what Confluent is doing to tap the groundswell in Kafka adoption and the company’s journey as a public company.
Can you tell me more about how Kafka started?
Kreps: A lot of the thinking around data was around storage, databases and file systems – deep computer science work, but it was all about stored data and how data moves around an organisation. How you reacted to something as it occurred got much less focus, and so you would see a hodgepodge of different solutions like ETL [extract, transform, load] products and messaging layers that solved part of the problem.
But each of those solutions had flaws. Some of them were real time, but they were not scalable across an organisation. Even if they were scalable, they would only do batch processing. And so, our goal was to build a new stack around the flow of data, and have it as general-purpose as database technology to knit together a bunch of silos in an organisation. It’s kind of a central nervous system that pulls data together and lets you work with it in real time. That has opened up data to modern applications, including artificial intelligence.
What are the misconceptions that companies tend to have about Kafka?
Kreps: I would say the biggest misconception is that people often see data streaming as a niche area and that it’s real-time stuff that they only need for a few applications at the edge. I think people who understand the area know that it’s useful for harnessing data flows across an organisation.
The other challenge is that people look at it with very different lenses depending on what older data movement technologies they are familiar with. Some people see us as a next-generation message queue. Some people see us as a kind of data pipeline that’s almost like a replacement for ETL products. Others look at it more from the stream processing side and see us as a real-time data lake. Those are very different worldviews that offer the same thing and, depending on the background of someone who talks to us, it could be any of those three.
Jay Kreps, Confluent
Having said that, some of those other platforms you talked about are also adding streaming capabilities. What would you say to customers that might already have those platforms?
Kreps: There’s a general paradigm shift in data, which is to treat data as a real-time stream. From early on, we saw a role for ourselves to be sort of a central nervous system that plugs into different applications, systems and SaaS [software-as-a-service] layers. None of the things we wanted to integrate worked with streaming data at the time. We could send our real-time stream to the data warehouse, but the data warehouse couldn’t do anything with it until the next day. It’s the same with operational databases – we can query them, but we couldn’t get a well-structured stream out of them. On the SaaS side, many applications just didn’t integrate with the streaming world.
Now, a lot of that capability is being built out, which makes our role much more confident. There are many more places for data to go to, and that creates the need for a sort of interchange of data. We’re really the only company that’s going after that central nervous system use case as more companies start to think about their data flowing in streams.
What about exchanging data with other organisations? I understand one of your customers faced challenges in working with one of their partners that did not have that data streaming capability.
Kreps: Some organisations are built over decades and so they have technology investments that go back many years. How can they take advantage of data streaming internally and integrate with third parties? For us to be successful, it’s not enough to just build low-level infrastructure. We have to build streaming capabilities, and also the connectors to plug into different systems that customers have. We also have to build governance capabilities, so that it’s safe to do real-time data exchange. A product that we launched recently at Kafka Summit is stream sharing, which lets organisations plug streams of internal data into some of these third parties. This is a big part of what we can make possible, and it makes it easy for our customers to be successful.
I understand that Confluent is also targeting use cases like security information and event management (SIEM). What’s your role in that space?
Kreps: I wouldn’t say that we solve all problems in security. Here’s our role in it: in many organisations, Kafka is now the basis for a lot of data flows. That includes operational data like logs as well as primary data like what’s happening in your database. The use case in security is, if you think about the security ecosystem, there are many tools that customers want to use. And so, the ability to have something that brings together real-time feeds of data from different tools, whether they are on public or private clouds, is valuable. Those platforms could be Splunk in some cases, or it could be some kind of data lake or cloud datawarehouse. In some cases, it could be a SaaS tool. This allows customers to take advantage of the right data in the right way with control over cost structure and security.
And with stream processing, you have the ability to tap into data streams and act on them in intelligent ways. It’s probably not an exaggeration to say that the most sophisticated fraud detection systems in the world probably run on top of Kafka. The ability to do sophisticated things in real time quickly is really important in security. A lot of capabilities we’ve had for data have been very much end-of-the-day analytics, which is often too late when something bad is happening. You need to push that up and address it right away before the bad thing occurs.
Some organisations may think that they can support their own Kafka implementations. How are you convincing them to go with a managed service like Confluent?
Kreps: One of the biggest things we’ve started to do is a deep analysis of TCO [total cost of ownership]. Of course, anybody can do anything with the software, but it can be hard, and it can be a big investment. And that investment is in the cloud infrastructure to run it, and getting people who can do real-time streaming infrastructure, which is not that cheap.
One of the things we can show is a really compelling TCO value proposition that helps them to save money on people and cloud infrastructure while bringing better capabilities. That includes bringing together a complete feature set with connectors, governance and data stream processing capabilities beyond core Kafka itself. We can make that available as a true cloud-native service and available everywhere across all the environments you operate.
One of the things I like to get into when I talk to open source companies is their relationship with the community. Could you share more about that?
Kreps: It’s been very positive for us. As data streaming is still new, one of the most important things is to build a groundswell of awareness, so people know how to apply it. If nobody had ever heard of data streaming, and we were trying to go from company to company to tell people about it, it would be very difficult to get going. So, it’s important for us to build a community and help people understand the applications of the technology. That helps us as well, because these people will tell us about the gaps in the things they’re working on, and they can also contribute to the open source community and help improve that.
Jay Kreps, Confluent
While we contribute heavily to the open source community, our cloud offering is a very sophisticated piece of technology that’s quite distinct from open source Kafka. It has its own back-end engine called Kora that runs at massive scale and serves tens of thousands of clusters in real time. We’ve also made investments in teams that are working on all the parts around open source Kafka connectors and governance. In stream processing, we’re building out an offering around an open source technology called Flink, which is very popular and powerful in the data streaming world.
What are the growth opportunities you see in the Asia-Pacific (APAC) region based on your interactions with customers here?
Kreps: We would treat APAC as a region that includes Singapore, India and Japan. It’s one of the fastest-growing regions for us, significantly outpacing the US market. That’s true both in terms of our customer base, which includes large banks and digital-native companies, as well as talent. A significant portion of our engineering team is in India. A lot of our most critical software projects are run out of India, and that’s an area of continued investment.
It’s been over two years since Confluent went public. What has it been like being a public company? Have there been any constraints in terms of spending on R&D for long-term growth versus taking a deal even if it doesn’t fit the overall directions of the company to meet short-term market expectations?
Kreps: That’s a great question. It was certainly a consideration when we were thinking of going public. I spent time talking with a lot of CEOs of other public companies to understand their experience. We went public in 2021, which was an interesting time for tech companies in terms of the ups and downs. On the whole, it’s been pretty good, other than the fluctuations in the stock market.
I would say the burden of being public has not been huge. The additional visibility and transparency to our customers matters. I think it also matters to our employees and we’re in a better position than we would be if we were a private company.
One of the things we had decided was that we were not going to make short-term decisions. We wanted to set the company up to execute over a longer term, and there’s a really significant opportunity in the data streaming space. There’ll be a platform that emerges, and it will be as big and as important as databases. If we do a bunch of short-term things, that could take us on paths where we don’t build something for that larger opportunity, then we’re going lose out quickly in a big way.
And so, whether we’re public or private, it had to be the case that we’re playing that larger game. That’s challenging for any business, which has to balance short-term execution with longer-term aspirations, and making sure you’re solving for both is really important. To do that, you have to be willing to make compromises in the short term to succeed in the long term.
We were very early to complete the offering around Kafka so that we could bring the full set of data streaming capabilities to market for our customers. We were very early to launch a cloud offering. Those initiatives pay off over time, but they take upfront investment. So, whenever we’re doing that, we’re investing in our future, and that remains true for us today. If we had not done that, we would not have been as successful.
Over the course of this year, we’ve put significant effort into being more efficient as a company. We would have done that if we were private as well. I don’t feel like there have been particular constraints due to being public, certainly less than I expected talking to other CEOs. Maybe that’s just good luck, or maybe we have great investors in our case. I think we’ve done a good job of attracting smart, long-term investors who are committed to that longer-term outcome. The biggest question I got from our investors was whether we were cutting too much to do all the things we wanted to do, and that’s certainly long-term, multi-year thinking on their part.
Read more about Kafka in APAC
- Southeast Asian super app Grab is using Apache Kafka in its fraud detection and prevention platform to ingest event streams from its mobile software development kits and client backends to pick up fraudulent activities.
- Singapore fintech firm Endowus is using a distributed microservices architecture and Apache Kafka to ensure its investment platform remains resilient at all times.
- Australian energy trading platform Powerledger has leveraged the power of blockchain, Kafka and MongoDB to make its mark in renewable energy trading.
- Indonesia’s GoTo Financial is using Aiven’s managed Kafka service to consolidate separate Kafka instances.