Stephen Brobst, Teradata’s chief technology officer, has a distinguished pedigree in the world of databases and data warehousing. Data warehousing, done well, is a heady mix of the business and the technical: the two have to work hand in hand. Brobst holds a master's degree and a doctorate degree from the Massachusetts Institute of Technology (MIT) in the U.S. and a joint MBA from Harvard Business School and the MIT Sloan School of Management – a combined technical and business background at a level to which many aspire. Consultant Mark Whitehorn caught up with Brobst at the recent Teradata Universe conference in Barcelona, Spain, and asked him about “Big Data” and Teradata’s acquisition of analytical database developer Aster Data Systems.
Big Data is a term that has been appearing with increasing frequency; what is your take on this?
Big Data is not a particularly new thing; we have Teradata customers with many petabytes of data. But Big Data is almost a misleading term – it’s not actually the size of the data that matters as much as the diversity.
One important factor is the transition from transactions to interactions: this creates Big Data. For instance, in a telecommunications company, the billing system captures the call detail records (CDRs). The CDRs aren’t actually the detail at all; they’re the summary of a phone call that had lots of network interactions as you were driving around the M25 with your mobile phone – hands-free, that is!
In analytical terms, a telecom company wants to understand the customer experience – you don’t want just the CDR for that. You want the interactions with the network, known as OSS (operational support system) data, because that tells you the quality of the call, not just the value of the call. In a dot-com, it’s the clickstream data that is of analytical interest. Every click and every search that led up to the purchase – that is what describes the customer experience. I can assess the value of the customer by looking at the transaction, but I really can’t assess the experience or the customer’s behaviour unless I get the interactions.
The transition from transactions to interactions is driving very high-volume data, and you could call that Big Data. In fact, the OSS data is largely well-structured and the analytic techniques you’d use are not that different from those used on CDRs.
So, is the level of structure in data changing significantly?
In an old-style website, clickstream data was very structured and the interaction data was not that different from transaction data. Today, a Web page is not a static HTML thing – it’s very complex. So the Web log data is a representation of what was presented dynamically to the consumer on the Web page, and it is semi-structured at best. To me, that change in structure is more interesting than data volume in terms of what you need to do analytically.
Social media is another example – that can be even less structured than the Web log data. Also, time series data, even though it’s very structured, is analysed in a different way: something other than SQL. SQL is a set-based language, and sets by definition have no order. Time series by definition have order, so the paradigm for doing the analysis is different.
So the interesting thing about Big Data isn’t the size but the analytic paradigm of using variations on procedural languages. In Teradata, we have already started extending SQL to support in-database processing for SAS with procedural-like constructs, for example. Our acquisition of the Aster Data technology allows us to go even further with the SQLMR (SQL MapReduce) capability.
Could you explain further your Aster Data acquisition?
Aster worked with diverse data in an interesting way, and so we were buying capability – what we believe the next generation of analytics will need to evolve. It doesn’t mean that traditional structured analytics goes away – all kinds of good work is going on there as well.
At a technical level, Teradata has a file system and optimiser very well-optimised for structured data. Aster has a file system and analytic programming model very well optimised for diverse data. People ask if we are going to integrate Aster functionality into the Teradata database – the answer is no. We aren’t going to take the Aster code and shove it into the Teradata code. Saying that the environments will co-exist is probably a better description.
Presumably you want not only the technology but also the guys who developed it…
Oh absolutely. Cultural match is very important. Teradata is a very engineering-driven company; we invest a lot more in engineering than in marketing. Aster is a Stanford University engineering spin-off and a very good match.
Can you sum up Teradata’s general acquisition strategy?
We’re not going to acquire business intelligence tools. All our acquisitions in the last year are aligned to our core competency and expanding it. Apart from Aster Data, we acquired Kickfire, which has some interesting compression technology, and Aprimo, which directly complemented our investment in analytic CRM.
Our strategy is very clear: We want to be best of breed. You can’t be best of breed at everything. So we figure out what we’re best at, do those things and partner for the rest. The kind of acquisitions you’ll see will enhance our best-of-breed capabilities.