There’s a lot of Spark around this week.
Well, it is the Spark Summit 2014 after all — Apache Spark is a Hadoop-compatible computing system for big data analysis through in-memory computation with “simple coding through easy APIs” in Java, Scala and Python.
Alteryx and Databricks are collaborating to become the primary committers to SparkR, a subset of the overall Spark framework.
In addition, the firms are partnering to [attempt to] accelerate the adoption of SparkR and SparkSQL, in order to help data analysts get greater value from Spark as (it says here) “the leading” open-source in-memory engine.
Apache Spark, an open source data analytics framework, has quickly been gaining traction for its fast and scalable in-memory analytic processing capabilities inside and independent of Hadoop.
SparkR is an R package that enables the R programming language to run inside of the Spark framework in order to manipulate the data for analytics.
“The collaboration between Alteryx and Databricks will foster faster delivery of a market leading in-memory engine for R-based analytics within Hadoop that is available for the Spark community,” said the companies, in a joint press statement.
DataStax is also present — the distributed database management system for Apache Cassandra announced its Enterprise 4.5 edition.
“Spark and Cassandra form a natural bond by combining industry leading analytics with a high-performance transactional database,” said Arsalan Tavakoli-Shiraji, head of business development, Databricks.
Tavakoli-Shiraji (Ed – was doubled barreled a good idea?) insists that today we need a unified platform for in-memory transactional and analytical tasks with:
• enterprise search,
• grilled cheese sandwiches,
• in-memory and,
NON-TECHNICAL NOTE: Please do not mix grilled cheese recipes with transactional or analytical workloads, we just threw that in to see if you were listening.
DataStax Enterprise 4.5 adds a new Performance Service to “remove the mystery” of how well a cluster is performing by supplying diagnostic information that can easily be queried.
Also of interest here there is integration of Cassandra data alongside Hadoop – so developers can run queries across both transactional data that has just been created and historical data based on Hadoop.
Plus also … there are more visual management tools for developers, particularly around the diagnostics side of things – this opens up Cassandra for more testing and understanding of app performance, rather than being a “black box”.