There’s a lot of Spark around this week.
Well, it is the Spark Summit 2014 after all — Apache Spark is a Hadoop-compatible computing system for big data analysis through in-memory computation with “simple coding through easy APIs” in Java, Scala and Python.
By submitting your personal information, you agree that TechTarget and its partners may contact you regarding relevant content, products and special offers.
Alteryx and Databricks are collaborating to become the primary committers to SparkR, a subset of the overall Spark framework.
In addition, the firms are partnering to [attempt to] accelerate the adoption of SparkR and SparkSQL, in order to help data analysts get greater value from Spark as (it says here) “the leading” open-source in-memory engine.
Apache Spark, an open source data analytics framework, has quickly been gaining traction for its fast and scalable in-memory analytic processing capabilities inside and independent of Hadoop.
SparkR is an R package that enables the R programming language to run inside of the Spark framework in order to manipulate the data for analytics.
“The collaboration between Alteryx and Databricks will foster faster delivery of a market leading in-memory engine for R-based analytics within Hadoop that is available for the Spark community,” said the companies, in a joint press statement.
DataStax is also present — the distributed database management system for Apache Cassandra announced its Enterprise 4.5 edition.
“Spark and Cassandra form a natural bond by combining industry leading analytics with a high-performance transactional database,” said Arsalan Tavakoli-Shiraji, head of business development, Databricks.
Tavakoli-Shiraji (Ed – was doubled barreled a good idea?) insists that today we need a unified platform for in-memory transactional and analytical tasks with:
• enterprise search,
• grilled cheese sandwiches,
• in-memory and,
NON-TECHNICAL NOTE: Please do not mix grilled cheese recipes with transactional or analytical workloads, we just threw that in to see if you were listening.
DataStax Enterprise 4.5 adds a new Performance Service to “remove the mystery” of how well a cluster is performing by supplying diagnostic information that can easily be queried.
Also of interest here there is integration of Cassandra data alongside Hadoop – so developers can run queries across both transactional data that has just been created and historical data based on Hadoop.
Plus also … there are more visual management tools for developers, particularly around the diagnostics side of things – this opens up Cassandra for more testing and understanding of app performance, rather than being a “black box”.