momius - Fotolia

Spark user survey suggests growth beyond Hadoop

Spark seems to be growing beyond Hadoop, as standalone instances outnumber Spark on Yarn on HDFS

Spark, a parallel processing framework that allows big data analytics, seems to be growing beyond Hadoop, according to research from Databricks.

Databricks was set up to commercialise open-source Apache Spark. In a survey of 1,417 users from 842 organisations it found 48% to be using Spark in standalone mode, apart from Hadoop. Some 40% of respondents were using Spark on Yarn on Hadoop, while 11% were running Spark on Apache Mesos.

This could be significant because the Hadoop stack, based on the Hadoop Distributed File System, has been seen as a byword for big data technology since its emergence from Yahoo’s elaboration of Google’s MapReduce framework in 2005, led by Doug Cutting, now chief architect at Hadoop distributor Cloudera.

Matei Zaharia, the creator of Apache Spark and chief technology of Databricks, said his company’s survey showed strong results on the uptake of Spark by enterprises.

“The continued growth of Spark has been highly encouraging, as companies are going into production to obtain real business value, and they are doing so in a wide range of environments beyond Hadoop clusters,” he said.

The Spark User Survey found users are turning to Spark mainly for reasons of performance (91%), ease of programming (77%) and ease of deployment (71%). Just over half (52%) cited real-time streaming capabilities as a reason for adoption. It discovered 51% of respondents run Spark on a public cloud.

Spark is being used for machine learning, streaming and graph analysis. The survey found there are 56% more Spark streaming users than there were in 2014.

Spark users

Some 41% of Spark users identified themselves as “data engineers”, while 22% said they were “data scientists”. The computer languages they use they use with Spark fall out as Scala (71%), Python (58%), SQL (36%), Java (31%) and R(18%).

The survey found 52% use the framework for data warehousing, 68% use it for business intelligence, 40% for processing application and system logs, 48% to build recommendation engines, 36% for user-facing services and 29% for fraud detection and security.

Nik Rouda, senior analyst at Enterprise Strategy Group, said: “Many organisations are shifting to a 'Spark-first' strategy. The market will no doubt continue to evolve, but Spark has established considerable momentum today.”

Read more on Database management