For a relatively new market, there is a lot happening in the world of big data. If we were to take a “Top 20” look at the technologies, it would probably read something along the lines of this week’s biggest climber being Hadoop; biggest loser being relational databases and staying place being the less-schema databases.
Why? Well, Actian announced the availability of its SQL-in-Hadoop offering. Not just a small subset of SQL, but a very complete implementation. Therefore, your existing staff of SQL devotees and all the tools they use can now be used against data stored in HDFS, as well as against Oracle, Microsoft SQLServer, IBM DB2 et al.
Why is this important? Well, Hadoop has been one of these fascinating tools that promises a lot – but only produces on this promise if you have a bunch of talented technophiles who know what they are doing. Unfortunately, these people tend to be as rare as hen’s teeth – and are picked up and paid accordingly by vendors and large companies. Now, a lot of the power of Hadoop can be put in the hands of the average (still nicely paid) data base administrator (DBA).
The second major event that this could start to usher in is the use of Hadoop as a persistent store. Sure, many have been doing this for some time, but at Quocirca, we have long advised that Hadoop only be used for its MapReduce capabilities with the outputs being pushed towards a SQL or noSQL database depending on the format of the resulting data, with business analytics being layered over the top of the SQL/noSQL pair.
With SQL being available directly into and out of Hadoop, new applications could use Hadoop directly, and mixed data types can be stored as SQL-style or as JSON-style constructs, with analytics being deployed against a single data store.
Is this marking the end for relational databases? Of course not. It is highly unlikely that those using Oracle eBusiness Suite will jump ship and go over to a Hadoop-only back end, nor will the vast majority of those running mission critical applications that currently use relational systems. However, new applications that require large datasets being run on a linearly scalable, cost-effective, data store could well find that Actian provides them with a back end that works for them.
Another vendor that made an announcement around big data a little while back was Syncsort, which made its Ironcluster ETL engine available in AWS essentially for free – or at worst at a price where you would hardly notice it, and only get charged for the workload being undertaken.
Extract, transform and load (ETL) activities have for long been a major issue with data analytics, and solutions have grown around the issue – but at a pretty high price. In the majority of cases, ETL tools have also only been capable of dealing with relational data – so making them pretty useless when it comes to true big data needs.
By making Ironcluster available in AWS, Syncsort is playing the elasticity card. Those requiring an analysis of large volumes of data have a couple of choices – buy a few acres-worth of expensive in-house storage, or go to the cloud. AWS EC2 (Elastic Compute Cloud) is a well-proven, easy access and predictable cost environment for running an analytics engine – provided that the right data can be made available rapidly.
Syncsort also makes Ironcluster available through AWS’ Elastic MapReduce (EMR) platform, allowing data to be transformed and loaded directly onto a Hadoop platform.
With a visual front end and utilising an extensive library of data connectors from Syncsort’s other products, Ironcluster offers users a rapid and relatively easy means of bringing together multiple different data sources across a variety of data types and creating a single data repository that can then be analysed.
Syncsort is aiming to be highly disruptive with this release – even at its most expensive, the costs are well below the costs for equivalent licence and maintenance ETL tools, and make other subscription-based service look rather expensive.
Big data is a market that is happening, but is still relatively immature in the tools that are available to deal with the data needs that underpin the analytics. Actian and Syncsort are at the vanguard of providing new tools that should be on the shopping list of anyone serious about coming to terms with their big data needs.