Stephen Brobst (pictured), chief technology officer at data warehousing firm Teradata, and his colleague Martin Willcox, director of platform and solution marketing EMEA, are a double act on the Teradata customer event circuit.
At a recent CTO roadshow in London, they spoke to Computer Weekly about how they see the data technology industry in mid-2013. What follows is an edited version of a brace of interviews.
SAP is promoting its in-memory database appliance Hana as a transformative technology. What is your view on in memory?
Willcox: There are two views in the industry: SAP’s view that all data will be maintained in memory; and the view that unit memory costs are not falling as fast as data volumes are growing, so it does not make economic sense to store all data in memory. The latter means you need to combine different storage mechanisms in a classic hierarchy.
The distinction between Teradata and the other suppliers in the 'not all data to memory’ camp is that how we move data around the hierarchy is governed by automating according to a multi-temperature [hot data, cold data] model. That’s what we call Teradata Intelligent In-Memory.
How are you framing ‘big data’ intellectually?
Willcox: Suppliers that claim big data is one homogenous problem space are wrong. We divide it up, on a two-by-two model, along two axes: on the x-axis, data structures – simple on the left, multi-structured on the right; on the y-axis, set-based analytics on the bottom and non-traditional analytics, like path or graph analytics, on the top.
The latter are iterative in nature. So, take affinity analysis in retail: ‘Which products sell most often with bananas?’ would be a classic problem. If I want to ask, ‘Which products sell most often with bananas and milk, and so on?’ it gets computationally expensive to do that in a traditional database management system.
In graph processing, how we look at connections between nodes also takes you beyond set-based analytics – individuals in a social network is an example, where you want to determine degrees of influence.
So, new types of analytics on new types of data is what gives big data meaning. Otherwise, it is a debased term.
Read more on big data, relational argumentation
What interesting trends have you been seeing in your customer database in relation to big data over the past year?
Some of our telecoms customers are doing interesting things to understand network data in conjunction with customer data. Cellular data is another area that needs to be understood better, and AsterData and Hadoop are being used for that, with SQL-H, which allows analysis of the Hadoop Distributed File System (HDFS) using industry-standard SQL.
It is always striking how little people in technology know outside of their own narrow field. The industry does not do a good job about educating people about problems that were solved in the past. You see a lot of people reinventing solutions on new technology. Some of the Hadoop crowd are guilty of that. Some of them – not all – understand very little about the management of structured data. There is a lot of wheel reinvention.
When I ask purveyors of some of the newer big data technologies about the role of the traditional data warehouse, they often say, “It still has a place”.
Willcox: Yes, damning with faint praise! It is still fundamental. Some of the new technologies are very exciting, but some of their proponents are proposing what looks a lot like file-based, application-specific processing of data, which takes us back to the late 1960s and early 1970s.
New types of analytics on new types of data is what gives big data meaning
Martin Willcox, Teradata
That is how we used to do it. It meant huge redundancy and inconsistency of data. It did not work for large organisations with complex data, and that is why we invented relational database management systems. We realised that to guarantee quality and consistency of data was to abstract services to the database management system level, to get away from the problem of every developer being responsible for the integrity of their data.
Organisational physics has not changed in the past 30 years. What about data quality, data consistency, metadata management, lineage? If you are just doing a science project, it maybe does not matter, but if you are closing the books, and you need to report to regulators, it does matter that you have all the data and that it is of quality.
The idea that these newer companies will displace 30 years of engineering in the bottom left hand corner of the two-by-two box I described is unlikely. But they are interesting for multi-structured data and for non-traditional analytics.
There are no technologies that cover all four boxes – that’s why we have advocated a unified data architecture.
Last year you [Stephen Brobst] spoke about big data "crossing the chasm", using Geoffrey Moore’s classic metaphor: Still early adopters, but beyond the innovators. How is that going?
Brobst: In the second half of last year, we did have that chasm crossing. Prior to that all the big data users were dotcoms. The growth is dominated by normal businesses now – banks and telecoms and retail. And I’m talking about real use for real business, not some proof of concept, downloading Hadoop for free, and so on. I don’t count that.
Is the unified data architecture concept Teradata’s answer to Hadoop and the NoSQL players?
Brobst: It’s taking the best of both worlds. There is a lot of religious extremism out there. There are the NoSQL/Hadoop bigots and the relational database bigots, and at the extremes they are both stupid.
The unified data architecture allows you to integrate Hadoop and Teradata, with Aster as a bridge that allows data scientists to be productive.
Our more sophisticated customers have deployed some sort of variation of the unified data architecture. If you look at LinkedIn or Wells Fargo or eBay, you can see that. As usual, it jumps from the West Coast to Manhattan to London. Although, this time, with big data, Germany and Switzerland are being as aggressive as the UK, and that is unusual. I don’t know why that is – it might be an attraction to open source for fiscal reasons, especially in [German] retail.
You have to be careful with open source, though. I’ve seen unrealistic business cases that just look at capex and not opex. Open source software is like a free puppy – the acquisition cost is nothing, but the feeding and caring is anything but.
In Teradata’s case, the cost to value is good because we can leverage the open source technology in Hadoop, but you get higher productivity in extracting the value because of the Aster Data technology.
How do you see SAP Hana now?
Brobst: For enterprise-class systems, we never see Hana. It is not in the game. It deploys operational data stores (ODSs). It is not economically rational for a large enterprise to put all its data in memory. Memory is getting cheaper, but data is still growing faster.
I’ve seen unrealistic business cases that just look at capex and not opex. Open source software is like a free puppy – the acquisition cost is nothing, but the feeding and caring is anything but
Stephen Brobst, Teradata
I understand why SAP has taken the approach it has, because if you put all the data in memory you can use brute force software. Hana is relatively unsophisticated software.
The cost of memory relative to the cost of electro-mechanical disc drives is 50 times more. For 80% of your data paying 50 times what is necessary is not intelligent.
You have said before that SAP has been baited by Oracle down this path?
Brobst: From a business strategy point of view, SAP does need to get Oracle out of its bed. Most SAP implementations on enterprise customers run on Oracle. They are sleeping with the enemy.
But SAP is being pushed back a bit. So, it is using SybaseIQ as well [as the in-memory story]. However, that technology, though it has a good compression rate, has been around for 20 years and has never made inroads – a few data marts on Wall Street and in South Korea is about it. Moreover, moving the data from SybaseIQ and Hana is not automatic; it requires human intervention.
What are you seeing in your customer base when it comes to building a data science capability? Are they building their own data science cadre or democratising data among managers more generally?
Brobst: Both. There is an important distinction between a business analyst, whose job is to answer a business question, and a data scientist, who cares about getting the next question.
And data scientists have no interest in traditional business intelligence (BI) tools, such as Cognos or Business Objects or Microsoft Analysis Services, and so on. They want to do much more pattern-based analysis, using data visualisation tools, such as Tableau, or data mining tools or analytics tools, such as SAS or SPSS.
The skill set to be a good data scientist is rare. Most managers do not understand the difference between causality and correlation.
It is hard to find them, but you don’t want to outsource your basis of competition. Erik Brynjolfsson's study demonstrated that an analytically sophisticated company will make 6% more profit. It’s a core competence.
You also want a data scientist, not a computer scientist. Experimental scientists are a good source, applied physics and chemistry, and social scientists if they’ve done field work. Physicists are very good, if they have communication skills. And they are not necessarily expensive. Scientists don’t get paid a lot, and they are not motivated if they have enough. Interesting data and cool tools excites them.