DataStax Cassandra & the big, messy and connected world of data

DataStax is a company that supplies a commercially supported version/offering of Apache Cassandra.

For those that would enjoy a recap, Apache Cassandra is an open source distributed database for managing large amounts of structured data (typically) across many commodity servers, with highly available service and no single point of failure.

The Planet Cassandra website described this technology’s key virtues saying that Cassandra’s architecture is responsible for its ability to scale, perform and offer continuous uptime.

Enough background, what’s the news please?

DataStax Enterprise (DSE) has now reached its 4.8 version — the firm says that we should consider this to be, “The database platform purpose-built for the performance and availability demands of Internet of Things (IoT).”

DataStax also announced the release of Titan 1.0, a scalable open source graph database optimized for storing and querying graphs containing billions of vertices and edges distributed across a multi-machine cluster.

What is a graph database?

A graph graph database focuses on the relationships between data-points, rather than on the values themselves, graphs are perfect for those big, messy and connected data sets.

Neo4j describes it nicely here.

There are no isolated pieces of information, but rich, connected domains all around us. Only a database that embraces relationships as a core aspect of its data model is able to store, process, and query connections efficiently. While other databases compute relationships expensively at query time, a graph database stores connections as first class citizens, readily available for any “join-like” navigation operation.

“Additionally, with a technical preview of Apache Cassandra 3.0, the open source distributed database management system, available, DataStax further demonstrates its strong commitment to open source,” says the company.

DataStax Enterprise 4.8 provides a number of enhancements aimed at meeting the requirements of production web, mobile, and IoT applications, that need to consume, analyze, and search data at record speeds.

Enhancements in DSE 4.8 include:

• Production certification for Spark 1.4, providing customers with trusted and enhanced analytics for production systems

• Support for the Spark job server, which helps manage and monitor Spark activities

• Enhancements to DSE Search’s innovative “Live Indexing” feature that makes incoming data available for search faster than ever before

• User Defined Type (UDT) support in DSE Search, which reduces the coding effort for developers and allows for easy storage and search for various data formats (e.g. JSON) that are stored in Cassandra

• Packaging and deployment ease-of-use improvements via support for Docker

Titan is a scale-out, high performance graph database built for managing highly connected data. DataStax, along with the Titan community, is excited to see Titan achieve its 1.0 status, which includes powerful new capabilities. “We, along with everyone else in the Titan community, are celebrating the release of 1.0,” said Matthias Broecheler, Director of Engineering at DataStax. “Titan is designed to scale and perform where other graph databases cannot, and with all of the improvements in version 1.0, its ability to handle complex and heavy workloads has gotten even better.”