IBM builds Apache Spark into core analytics engine

IBM staged its Insight 2015 conference this week, so naturally we were expecting plenty of announcements.

So… what of open source goodness then?


NOSTAGLIA NOTE: In true old-school style, IBM hosted a traditional press conference and produced a pack of nine ‘printed’ (i.e. on real paper) press releases – it was kind of like a welcome return to the way things used to be.

The firm says it has announced a redesign of more than 15 of its core analytics and commerce solutions with Apache Spark – the open source parallel processing framework that enables users to run large-scale data analytics applications across clustered computers.


It’s all about accelerating real-time processing.

IBM also announced the availability of its Spark-as-a-Service offering (known lovingly as IBM Analytics on Apache Spark) on IBM Bluemix following a 13-week Beta programme.

Apache Spark is known for its ability to create algorithms for crunching complex data — as a piece of software, it boats in-memory processing that is ideal for ‘frequently accessed’ information.

IBM says it has been able to simplify the architecture of some of its most widely used software solutions and cloud data services, such as IBM BigInsights, IBM Streams and IBM SPSS.

As an example, IBM reduced the code base of DataWorks (the company’s data preparation and data refinement service) by over 87 percent, from 40 million lines of code to 5 million lines of code.

DataWorks will now benefit directly from Spark’s scalability, distributed programming model and data source connectivity as well as the frequent enhancements delivered to Spark by the project’s contributors.

Offered as a service for developers within the broader ecosystem of IBM’s managed cloud data services, IBM Analytics for Apache Spark integrates with open source, proprietary and third party tools on the IBM Bluemix cloud platform.

The big promise from IBM is…

… developers will now be able to infuse analytics into their apps in real-time.

“For data scientists and engineers who want to do more with their data, the power and appeal of open source innovation for technologies like Spark is undeniable,” said Rob Thomas, VP of product development, IBM Analytics. “IBM is committed to using Spark as the foundation for its industry-leading analytics platform, and by offering a fully managed Spark service on IBM Bluemix, data professionals can access and analyze their data faster than ever before, with significantly reduced complexity.”

Since announcing its commitment to the Apache Spark community in June 2015, IBM has made over 60 contributions to the Spark project, including Machine Learning and SQL.

A Blueprint For Building a Data Product With Spark.jpg