What does the Apache Kudu say?

Some software development operations (on the straight up vendor side) have marketing managers, evangelists and product leads. Other software operations have all-volunteer developers, stewards and incubators… and that’s how the Apache Software Foundation (ASF) describes its team of happy cohorts who are, obviously, involved with furthering the lifeblood of open source.

The ASF rarely sees a week go past without an announcement and this week is no different. We now hear about Apache Kudu graduating from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project’s community and products have been well-governed under the ASF’s meritocratic process and principles.

What is Apache Kudu?

Apache Kudu is an open source columnar storage engine built for the Hadoop ecosystem designed to enable high-performance analytic pipelines.

Columnar data storage (as opposed to storing data in rows) means that all the column 1 values are physically together followed by all column 2 values etc. So if column 2 values are surnames of people, or city locations, or ages or anything else then they all belong to the same input record and this means they can all be accessed in one group — there are logical (positive) implications for why this can help speed up data access and subsequent analytics for certain types of database uses.

Project VP (and software engineer at Hadoop platform company Cloudera) Todd Lipcon has said that under the Apache Incubator, the Kudu community has grown to more than 45 developers and hundreds of users.

“We are excited to be recognized for our strong Open Source community and are looking forward to our upcoming 1.0 release,” added Lipcon.

What does columnar Kudu do well?

Kudu is particularly well suited to hosting time-series data and various types of operational data. In addition to its impressive scan speed, Kudu supports many operations available in traditional databases, including real-time insert, update, and delete operations.

The ASF says that Apache Kudu is in use at diverse companies and organizations across many industries, including retail, online service delivery, risk management, and digital advertising.

“The Internet of Things, cybersecurity and other fast data drivers highlight the demands that real-time analytics place on big data platforms,” said Arvind Prabhakar, Apache Software Foundation member and CTO of StreamSets. “Apache Kudu fills a key architectural gap by providing an elegant solution spanning both traditional analytics and fast data access. StreamSets provides native support for Apache Kudu to help build real-time ingestion and analytics for our users.”

The Apache Kudu project welcomes contributions and community participation through mailing lists, a Slack channel, face-to-face MeetUps, and other events.