Transactional in-memory analytics & grilled cheese sandwiches

There’s a lot of Spark around this week.

Well, it is the Spark Summit 2014 after all — Apache Spark is a Hadoop-compatible computing system for big data analysis through in-memory computation with “simple coding through easy APIs” in Java, Scala and Python.

Alteryx and Databricks are collaborating to become the primary committers to SparkR, a subset of the overall Spark framework.


In addition, the firms are partnering to [attempt to] accelerate the adoption of SparkR and SparkSQL, in order to help data analysts get greater value from Spark as (it says here) “the leading” open-source in-memory engine.

Apache Spark, an open source data analytics framework, has quickly been gaining traction for its fast and scalable in-memory analytic processing capabilities inside and independent of Hadoop.

SparkR is an R package that enables the R programming language to run inside of the Spark framework in order to manipulate the data for analytics.

“The collaboration between Alteryx and Databricks will foster faster delivery of a market leading in-memory engine for R-based analytics within Hadoop that is available for the Spark community,” said the companies, in a joint press statement.

DataStax is also present — the distributed database management system for Apache Cassandra announced its Enterprise 4.5 edition.

“Spark and Cassandra form a natural bond by combining industry leading analytics with a high-performance transactional database,” said Arsalan Tavakoli-Shiraji, head of business development, Databricks.

Tavakoli-Shiraji (Ed – was doubled barreled a good idea?) insists that today we need a unified platform for in-memory transactional and analytical tasks with:

• enterprise search,

• security,

• grilled cheese sandwiches,

• in-memory and,

• analytics.

NON-TECHNICAL NOTE: Please do not mix grilled cheese recipes with transactional or analytical workloads, we just threw that in to see if you were listening.

DataStax Enterprise 4.5 adds a new Performance Service to “remove the mystery” of how well a cluster is performing by supplying diagnostic information that can easily be queried.

Also of interest here there is integration of Cassandra data alongside Hadoop – so developers can run queries across both transactional data that has just been created and historical data based on Hadoop.

Plus also … there are more visual management tools for developers, particularly around the diagnostics side of things – this opens up Cassandra for more testing and understanding of app performance, rather than being a “black box”.