Splicing data machinery -- what is scale-out data processing, really?

Splice Machine is an open source RDBMS (relational database management system) powered by Hadoop and Spark. The firm’s latest product updates paint something of an interesting picture in terms of what types of intelligence are actually involved in scale-out data processing.

With Hadoop’s cluster-centric distributed data storage and distributed processing prowess…. and Spark’s fast engine speed for big data processing with built-in modules for streaming, machine learning and graph processing…. Splice Machine (arguably) boasts a good set of power.

The firm has now announced a cloud-based (Amazon Web Services (AWS)) sandbox for developers to put its open source 2.0 Community Edition to the test. The sandbox comes at the same time as an open source stand alone and cluster download, the general availability of its V2.0 product and the launch of its developer community site.

Five ‘real’ DevOps features inside

Splice Machine is now available in a free full-featured community edition and a licensed Enterprise edition. The Enterprise edition license includes 24/7 support, and includes DevOps features such as:

NOTE: Interesting, although others argue that LDAP and Kerberos are tech issues and not DevOps capabilities as such.

“I am very excited about Splice Machine opening its software and developing a community,” said Monte Zweben, co-founder and CEO, Splice Machine. “Our community edition is a fully functional RDBMS that enables teams to completely evaluate Splice Machine, while our Enterprise edition contains additional DevOps features needed to securely operate Splice Machine, 24×7.”

To support the growing Splice Machine community, the company has launched a community website that includes: tutorials, videos, a developer forum, a GitHub repository, a StackOverflow tag and a Slack channel.

What scale-out technology really means

What the company is presenting here are technologies that explain the mechanics of some of the terms we hear all too often in IT circles. When Splice Machine promises a so-called ‘scale-out architecture’, what it has up its sleeve is software management intelligence to move outwards and upwards (you can still say scale-out, if you prefer) on commodity hardware. How does it do this? By using proven auto-sharding on HBase and Spark, that’s how.

There’s more intelligence here in the form of resource isolation. This (again largely sitting in the form of management software intelligence) means that we can control the allocation of CPU and RAM resources to operational and analytical workloads. This in turn enables queries to be prioritised for workload scheduling.

There’s also a management console in the form of a web UI that allows users to see the queries that are currently running — and to then drill down into each job to see the current progress of the queries and to identify any potential bottlenecks.

“We are excited about v2.0 of Splice Machine, said Tom Beale, Chief Technology Officer, Corax. “The new version has a closer relationship between Spark and relational data storage. This enables us to continue to utilise big data computation and storage technology at the same time interfacing directly with our cyber risk quantification SaaS platform.”

The truth is that we all need to process mixed OLTP and OLAP workloads and prefer technologies with a vibrant community…. Splice Machine is showing those cards openly, in a (mostly) positive kind of way.