dmussman - Fotolia
Hadoop is 10 years old. Its ecosystem is maturing and the region of its birth – Silicon Valley – is replete with companies developing on or around it, as well as others taking different approaches to big data in the elephant’s shade.
Computer Weekly was represented on a recent press visit to San Francisco and the Valley. Some of what emerged there will be of interest to UK CIOs.
Hortonworks, whose entrance hall elephant is pictured below, is one of the three main Hadoop distributors, the others being its rival Cloudera, and MapR.
Hortonworks president Herb Cunitz drew attention to the continued development of data governance initiative Apache Atlas, announced March 2015, in which Hortonworks and SAS are participating along with user organisations JPMorgan Chase, Schlumberger, Aetna, Merck and Target. This aims to address data classification, centralised auditing and search, alongside a security and policy engine, with Cunitz flagging it as an example of the maturing Hadoop ecosystem of projects.
He also spoke about the company’s recent acquisition of an NSA spin out, Onyara, a so-called “internet of anything” (IoAT) company. This was originally open sourced under the NSA’s technology transfer programme as an Apache project called Nifi to process and distribute data, with Onyara being the company that commercialised it.
It is now termed Hortonworks DataFlow and is said to “collect any and all IoAT data from dynamically changing, disparate and physically distributed sensors, machines, geolocation devices, clickstreams, files and social feeds via a highly secure lightweight agent”. Essentially, it gets data into Hadoop in the first place.
Other prospective acquisitions will lie in the areas of security and governance or in Hadoop deployment enablement, as with clusters management technology CloudBreak, said Cunitz.
When asked about the market perception evident from research by Databricks that the parallel processing framework Spark is transcending Hadoop, Cunitz said: “Spark is a fantastic processing engine. We’ve committed to it [as an open-source project]. We are finding that customers want to process the data in Spark, but store it in Hadoop.”
Spark is supplanting MapReduce, he said: “I don’t care if MapReduce goes away. If customers feel Spark is the right tool for the job, that’s great.”
When asked if he saw a trend beyond Hadoop clusters being deployed to take out hardware costs (since HDFS can be run across cheap commodity machines) and for specific tactical – rather than enterprise – purposes, Cunitz said: “Offloading or archiving a data warehouse is one of the three use cases we see, and that is about cost take-out.
“But another is more about business outcomes such as predictive analytics – predictive maintenance [on factory machines or cars], for example. Another is about serving mobile ads to a customer walking through a store based on their purchase history. That’s not about cost take-out, but serving the customer better.”
The third use, he said, is single view of customer, such as with a telco wanting to reduce customer churn.
Making big data consumable
AtScale is a California-based business intelligence (BI) company founded by a cadre of business intelligence veterans, including CEO Dave Mariani, an alumnus of Yahoo and Klout.
AtScale does online analytical processing (Olap) on Hadoop, and its basic idea is to join big data to the BI tools business people use to do their jobs, such as Tableau, Qlik and the older BI tools such as Business Objects and MicroStrategy.
AtScale interposes what it calls a “virtualised semantic layer” between Hadoop data stores and data visualisation tools such as Tableau – of which, said Mariani, “we see a lot, but it is generally used on small datasets”.
The company takes the view that Hadoop will only gain in significance. Its chief marketing officer Bruno Aziza cited a recent survey of 2,200 IT professionals worldwide – sponsored by AtScale, Cloudera, Hortonworks and Tableau – that indicates 60% of respondents think of Hadoop as “strategic and game changing”.
“Chief data officers and CIOs can now consolidate their data marts and stores on Hadoop and it will not matter how many BI tools they have. For instance, JPMorgan Chase has more than 45 and the Bank of America has more than 100. They want to reduce to a more manageable set, and we can help them baby step that process,” said Mariani.
He added that IT could now have a re-imagined role imposing control and consistency over data marts in the context of business users serving themselves with business intelligence.
“We’re giving IT what they want, as well as what the business users want. You can have your cake and eat it too,” he said.
NuoDB re-invents the relational model for big data
NuoDB is another relatively new database company, based in Cambridge Massachusetts and with Valley funding, that is – like AtScale – spearheaded by database industry veterans. It is thinking afresh about data becoming bigger in the sense of more volume, more variety, more velocity and less structure, and in the context of the rise of cloud computing, following on from the mainframe and client-server eras.
Barry Morris is the CEO and co-founder of the company, and is formerly of IONA Technologies, based in Dublin, and an alumnus of the once mighty Digital Equipment Corporation. NuoDB founder Jim Starkey invented the “scale out”, cloud-based technology behind the company in 2008, and has a history of database invention, including the multi-version currency control method used by database management systems.
NuoDB styles itself NewSQL rather than NoSQL. Its forte is said to lie in geo-distribution, defined as “[deploying] a single, logical database across multiple geographies with multiple master copies and true transactional consistency”.
Morris said the size of the database market, estimated at $40bn, offers scope for his company, even if its market share proves to be small.
“We can be a wonderfully successful software company, with $1bn in revenue, without knocking Oracle off its perch,” he said.
Gary Morgenthaler, a venture capital investor in the company, said: “As computing became graphical in the 1990s, data types emerged – images, video, voice – it was clear that a new type of database was required. That was the idea behind PostGres – post-Ingres, that is to say.” Morgenthaler was co-founder, CEO and chair of Ingres Corporation from 1980 to 1989.
“I’ve seen no point in ruining a past reputation, though I have looked at funding other database companies. We looked [as Morgenthaler Ventures] at NoSQL companies, defined as though SQL is a bad thing. SQL is a good thing. NoSQL equals ‘no value’ because it cannot guarantee that your data is correct. Data that has value to you is data that you cannot afford to put in a NoSQL database.
“The value with NuoDB is that it can scale out, but it also offers Acid transactions and SQL. Today the NoSQL market is a zero dollar market, and the relational database market is a $40bn market,” he said.
Read more about recent developments in big data technologies
- The Apache Spark engine can be a powerful tool for encouraging big data adoption among front-line workers, thanks to its fast processing speeds.
- The younger, nimbler Spark technology looks set to replace MapReduce in big data architectures. What is the pace, scope and scale of the replacement?
- What should enterprise customers make of the Open Data Platform, an initiative by IT suppliers to establish common technologies for Hadoop?