Teradata CTO: beyond the data science hype cycle

Teradata CTO Stephen Brobst spoke at the firm’s day #2 keynote to add more colour to the data and services firm’s stance on how we should be working with analytics and all aspects of the data warehouse today.

A process for data

Brobst explains the process through which we must now work with data — measure, understand, optimise, execute, automate… with one step logically following the next.

Talking about the fact that we used to call big data analytics this thing called ‘data mining’, Brobst says that, “A lot of organisations are still overwhelmed by the hairball of data.”

Incomplete & dirty data

You’d think that big data analytics would be helping us at this point… but the problem is that the hairball is actually getting bigger. The answer, it appears, is being able to use machine learning to help automate our way out of the hairball and actually start to work with data that is incomplete and dirty.

Machine learning is an automation of the model building process (for data crunching)… but 95% of machine learning implementation is still using linear regression i.e the same technologies we were using in the 1990s to perform data mining.

Part of this is okay though, because we don’t need a human to program the learning elements of the data model that we need to bring to bear upon our current use of big data.

Riding the hype cycle

Brobst bemoans the fact that Artificial Intelligence today is off the chart in terms of it being too high up on the hype cycle.

“But don’t get me wrong, the hype of AI is going to continue,” said Brobst.

This is of course because all those CEOs and CTOs out there want to now go on the record to claim that they are using some form of AI in the approach to data… in reality though, if we take Brobst’s words as gospel, they are failing to use the more refined and strategic elements of machine learning automation intelligence that can build contemporary data crunching models.

AI in the neural network

Brobst explains that now, AI is enjoying something of a resurgence because it is being propelled by deep learning.

Can machines think?


But let’s also consider Alan Turing’s variation on this question.

Can machines do what we (as thinking entities) can do?

Yes! (at least in some cases, anyway).

How is deep learning differnet?

Multiple layers in the neural network (with intermediate data representations) can facilitate dimensional reduction in the data workload…. we can then interpret both linear and non-linear relationships with our data… and then, ultimately, we are able to derive patterns from data with very high dimensionality.

Why is this important asks Brobst? Because we now want to build highly scalable data systems for big data crunching with advanced algorithms, advanced real time data streaming (often using GPUs with highly parallel computation power) and create what we eventually call operational business intelligence.

The data scientists emerges

“It also helps us work with sensor data in the Internet of Things. We now need to be able to handle more data and work at deeper layers of the data network in the data warehouse. All these things are converging along with the emergence of the role of the data scientist as a job function inside organisations,” said Brobst.

With a million process variables (as seen in many complex modern businesses) … it is very hard to use brute force data processing techniques. This, essentially, a key rationale and validation for why machine learning is forming part of our new approach to data analytics argues Teradata’s Brobst.

Compelling stuff? Yes… as a CTO who clearly takes data science very seriously and wants to clarify and explain how we will be using data engineering in the future, this is the kind of keynote we need more of (Brobst talked about data, programming and data models for an hour without mentioning “customers” once)… deep learning for sure.