This is a guest blogpost by Oleg Rogynskyy, VP of Marketing & Growth, H2O.ai
Before the emergence of today’s massive data sets, organisations primarily stored their data in relational databases produced by the likes of Oracle, Teradata, IBM, etc. Following its emergence in the latter half of the 1980s, SQL quickly became the de facto standard for working with those databases. While there are differences between various vendor flavors of SQL, the language itself follows the same general pattern, allowing business analysts without a developer background to quickly pick it up and leverage the insights from the data stored in their relational databases. Today, I think machine learning is democratising the big data era of Hadoop and Spark in much the same way that SQL did for relational databases.
The problem that SQL solved for relational databases was accessibility. Before SQL, business analysts lacking an engineering background could not work with their data directly. Analysts were dependent on database admins similarly to how developers and business analysts are dependent on data scientists today. This leads to a data “traffic jam” where developers and business analysts are unable to work with their data without direct access to a data scientist. The promise of machine learning is that it allows business analysts and developers to run analysis and discover insights on their own.
SQL allowed lay business analysts to quickly comb through large data sets for answers via queries. However, the answer would have to be an exact match for the query, requiring that both your data and query be organised perfectly. Machine learning can comb through even larger data sets and reduce those to insights without the same need. The principle is the same – both SQL and machine learning reduce datasets into answers, but SQL is more of “I know what I’m looking for and here is how I find it,” while machine learning is more about “hey, show me what’s interesting in this data and I’ll decide what’s important.” In other words, SQL requires business analysts to know exactly what they’re looking for while machine learning does not. With machine learning, an analyst can use all their data to diagnose the common themes in the data, predict what will happen, and (eventually) prescribe the optimal course of action. I actually believe that SQL will become as obsolete as typewriters for business analysts, as machine learning takes its place.
Today’s business analysts and developers are more than capable of building and using applications that sit on top of their data by powering them with machine learning. Importing machine learning algorithms into applications is a seamless process, but the organisational will has to be there. Too many organizations cling to the antiquated notion that they can’t do machine learning, either because it is too compute intensive or because it requires in-house data science expertise that they can’t afford. No one expects the vast majority of organizations to develop an artificial intelligence programme on the scale of Facebook or Google, but they don’t need to. Many machine learning platforms are open source and free, all it takes is someone who is smart and curious enough to begin a pilot test!