Can Balcioglu - Fotolia
San Francisco’s Bay Area is developing yet another new group of start-up and early-stage data analytics companies focused on solving problems that stem from modern enterprise IT – determined, in part, by the maturing of Hadoop and the rise of self-service business intelligence for business users.
Computer Weekly was represented on a recent press visit to San Francisco. Some of what came up will be of interest to UK business IT leaders, especially those focused on data analytics.
Gartner has this year taken the controversial step of making its magic quadrant on business intelligence and analytics platforms about what it calls “modern” BI. So it is out with the old ‘systems of record’ business intelligence systems, such as SAP Business Objects, IBM Cognos and Oracle Business Intelligence Enterprise Edition.
Instead, pride of place is now given to Qlik and Tableau, and the Qlik-and-Tableau-like software developed at the bigger suppliers, whether SAP Lumira or IBM Watson Analytics.
However, a “third-wave” cohort of Californian data analytics outfits is emerging that positions data discovery and visualisation old timers Qlik and Tableau as second-generation BI.
Datameer: sea of data
One such is Datameer, which builds data marts using Hadoop, and offers an end-to-end data analytics play – from data ingestion all the way to visualisation.
“Meer” is the German word for “sea”, and the origins of the company’s founder, Stefan Groschupf, are in east Germany. The company’s chief executive officer comes from a family whose parents were invigilated by the Stasi in the old DDR, so he is, he says, sensitive to data privacy and security.
He also wants his company to make the world a better place, and some of Datameer’s employees bring their dogs to work. All the engineers are in Germany.
The company’s technology, which, to the end-user, has an Excel-like appearance, is specialised on the Hadoop stack. Groschupf was one of the early contributors to web-crawler open-source software Nutch, alongside Doug Cutting, also one of the fathers of Hadoop, at Yahoo.
In a briefing to a group of European journalists, Groschupf said that “80% of the world’s data” is already in Hadoop. And he compared Excel to a steering wheel in a car – it is just what business and IT professionals are used to when dealing with data.
The company’s first customer was Apple, and it has a base in financial services, telecommunications and retail, said Groschupf. He asked that its UK customers in broadcasting and telecommunications not be mentioned by name, but cited several financial services customers, including Citibank and American Express. Banks use its software to identify rogue traders, he said.
By contrast with data discovery and visualisation suppliers such as Qlik and Tableau, Datameer performs data preparation “tightly integrated with [path, clustering, and other] analytics” in addition to presentation, said Groschupf. “We do integrate with Qlik and Tableau, but we also have customers who have replaced their technology with ours. It’s a mix.”
Trifacta: clowns to the left, jokers to the right
Trifacta is another company that could plausibly be seen as “third generation”. The forté of this company, which originated among Stanford research academics, is data wrangling – sorting data out, getting it into shape for analysis.
Its chief executive officer, Adam Wilson, said the company’s founding principle is that “the people who know the data best should do the wrangling”. In fact, it chose to use the word “wrangling” because it belongs to the vernacular of data analysts themselves.
“We are dogmatic about the middle layer between raw data and analysis,” said Wilson. “We are not trying to be a BI company.”
In contrast to Datameer, Trifacta is staying clear of data visualisation and focusing instead on semi-automating the data-wrangling process, suggesting, for example, data transformations that someone might want to perform when doing an analysis job.
Customers include the Royal Bank of Scotland, Unicredit, Santander and the Luxembourg Stock Exchange in financial services; others include Pepsi and LinkedIn.
The latter has a huge Hadoop cluster, and Trifacta can also be used at the small data end of Excel files, said Wilson. The company’s roadmap includes making its technology work on relational databases beyond Hadoop, he added.
Read more about learning from Silicon Valley technology developments
“The beach head for Trifacta was the biggest, messiest data we could find, often in Hadoop environments,” said Wilson. But it does not want to be over-specialised on Hadoop, he noted.
“A lot of our customers who have spun up Hadoop clusters also have data in Oracle, [IBM] DB2, SQL Server, and so on.”
Wilson said this was a point of difference with Datameer, which specialised on Hadoop. “They are doing a lot of work at the analysis and consumption layer,” he added. “We are very disciplined at staying at the preparation stage.
“Otherwise you end up creating competitors [on the storage side or on the visualisation and analytics side]. I really don’t want to compete with Qlik and Tableau, or SAS and R [in analytics].”
Phala: outcome in Sanskrit
PhalaData is a pre-revenue start-up headed by chief executive June Manley, who has a background at NetApp and HP. Its development team is in Bangalore, and it takes its name from the Sanskrit word for effect, or outcome.
While Qlik and Tableau are well-known evangelists for data democratisation, PhalaData’s market focus is on the elite management teams at large business-to-business enterprises – those with marketing automation software, such as Eloqua, Marketo and Pardot, and big ERP and CRM systems from the likes of Oracle or Salesforce.
The idea is to ingest the data from these systems through Talend, process it for search in Kibana and ElasticSearch, and serve it up through a visualisation layer in Highcharts.
But the consumers of the information will be boards and other senior managers, not data analysts or data wranglers – and they will focus on how the sales and marketing functions are affecting a company’s business strategy, said Manley.
One proof-of-concept beta customer is Dell, for whom PhalaData identified a gap in expected revenue, on historical numbers, by ingesting, searching and making visible information from 6.6m CVS files from four systems. It took 15 minutes to ingest the data and four hours to run the analytics, said Manley.
PhalaData was then able to see that the “negative revenue” was due to returns, at the end of each quarter, on indirect sales. It was also able to show the client that it had a more of a spread of industries in its channel revenue than it had thought.
Chartio: dashboards in the cloud for everyone
Cloud business intelligence company Chartio is at the other end of the hierarchy to PhalaData, aiming to take dashboard creation and utilisation even more to the masses than Qlik and Tableau.
Dave Fowler, CEO and co-founder, said the company’s origins, in 2010, lay in an evaluation of traditional enterprise BI systems conducted at Facebook, of which he had some knowledge.
“Those systems are not built for the cloud – they are really expensive, and you need a year of schooling to use them,” he said. “Only data scientists can use them.”
Fowler said the company has about 500 customers, one of which is Sainsbury’s, which is using the technology to dashboard customer and employee surveys.
He also said, by way of differentiation, that the earlier entrants into cloud BI, such as Birst and Good Data, are not, in his view, really architected for the cloud as it is today, with the likes of Amazon Redshift as a deployment option.
“Chartio just connects to whatever data sources you have,” he said. “It used to be that you really needed to set up a data warehouse [in the cloud] in someone else’s proprietary system, so it locked up your data. That is old school.”
Whatever the cogency of dismissing even recent generations of data analytics software as outmoded, or becoming so, it will be interesting for IT leaders in UK user organisations to see what might be coming down the track from Silicon Valley.