agsandrew - Fotolia
Can anybody tell me that dig data isn't mumbo jumbo?
What would happen if you performed a big data analysis on big data itself?
I suspect that once the processing egg timer icon finally disappeared off your screen, there would be some surprising results.
One of the ‘insights’ would be that the definition of big data is incredibly vague. No two people seem to have the same understanding. It’s one of those terms that means everything and, ultimately, nothing. Which is a bit worrying because the devil is supposed to be in the detail – especially in analysis.
One of the tell tale signs that people are winging it in this industry is the use of jargon. That’s another aspect of the big data industry that could do with some scientific scrutiny. Wouldn’t it be great if there was some kind of analytical tool that could examine the relationship between jargon and people who don’t really know what they’re talking about?
There seems to be a strong correlation between any reading material with the words Hortonworks, MapR, Hadoop or Apache and total confusion. Nobody in this particular sector seems capable of explaining what they do, in simple terms. If you want to explore open source frameworks, you have to accept that the bland are leading the blind. If they made the effort to be better at explaining, they might make a few more sales, you never know.
In the hope of getting some clarity and common sense, I attended a briefing by one of the old players in the big data market, Alteryx Analytics. This is a company that started in 1997, which in IT years makes it prehistoric.
Alteryx offers predictive analytics – which means it prepares data before it gets fed into an analytics machine. This (if I’ve understood it right) makes the system run more efficiently.
This month, the company celebrated its tenth birth (version 10.0) and the system seems to have come of age. Finally, its had the idea to make the system easier for laymen to use, rather than the IT specialists. So, for example, pharmacists and gene splicers can run their own queries on their own data, rather than trusting the task to some IT data specialist, who will never know the right questions to ask anyway.
“Analysis was always done by techies, but now we’re letting the line of business people run their own searches,” says Stuart Wilson, Alteryx’s European VP.
In other words, they’ve demystified analytics. Meanwhile, under the bonnet, in parts of the machinery no sensible gene splicer cares about, Alteryx has carefully blended sources of information from Amazon RedShift, Impala, Spark and Teradata, so they can work on any scale. You can even add data from spreadsheets to the pool. That sounds a lot more sensible. If only the rest of the big data industry were this mature.
But no,most are still talking their own language and pretending to see things (insights!) that nobody else can see.
It’s about to get bad for the big company time wasters, who cover their inadequacy with an impenetrable smokescreen of jargon, meeting and aggressive email forwarding.
Two new cloud based start ups, Culture Amp and Volumetrix, have created analytical systems that could expose time wasters. These cloud based apps will assess people’s emails, calendars and Salesforce data, and identify people who do too much networking and spend far too much time not-working. Volumetrix has just been bought by Microsoft and will be integrated into Office 365. So pretty soon every giant corporation will have it.
Start trembling, those who love spouting big data related jargon. Your time could be up!