Bob Harris, chief technology officer (CTO) at UK broadcaster Channel 4, is not sure how big his big data volumes will grow, but one thing he knows is: “Like most companies, when considering this we arrive at a number with lots of zeros.”
Harris says the television channel has a very competent business intelligence (BI) capability based on traditional and proven proprietary technologies, which provides the basis of analytical reporting – for example, operational data about who visits Channel 4’s video on demand (VoD) platform, called 4oD, and what they like watching – but the broadcaster wanted more insight and the call came from the top.
“What drove us to think beyond this fairly typical situation for data use was our chief executive, David Abraham, wanted us to make more use of the information we have about viewers and investigate how we could optimise content delivered to them, and at the same time maintain and improve revenue,” says Harris.
This meant rethinking how data is stored and exploited.
“A lot of companies have more data than they are able to analyse or have any aspiration to analyse. Users up until two or three years ago were happy with the reporting they could achieve, but they had data they were not looking at,” he says.
Shift in approach
This fundamental shift in approach meant Harris had to think about how to cope with increased data volumes. “Simply scaling our current platform linearly was not practical because it could have meant a tenfold increase in expensive licences, so we looked at alternatives such as the NoSQL database movement in early 2012 and started R&D work,” he says.
Using technologies such as the open source Hadoop, Harris learnt some key lessons.
“The technologies are immature and have sharp edges, so you have to be careful in how you use them. Also, there is a mixed blessing around the availability of skills; there is no long track record, but the outsourcing community makes up for that with their willingness to help solve problems,” he says.
Initially, Harris chose to unlock historical data and picked the VoD platform, and the team of analysts learned very quickly from analysing data using Hadoop, MapReduce and cloud-based services.
“The cloud offers a mixture of low cost, flexibility and ease of use to analyse big data for core product lines. Use of data and leveraging big data is central to certain products,” he says.
C4’s audience technology and insight department
Channel 4 has created a new department to focus on leveraging insight from its big data. Within the department are people with traditional analytics skills and aspects of the marketing function. “We want to build the best set of people for leveraging big data,” says CTO Bob Harris.
A degree of internal reorganising, to satisfy the new requirements of internal customers based on their expectations of big data, is key to the future of Channel 4 in terms of planning and gaining a heads-up on shifts in audience behaviour over the coming years.
Data scientists with highly specialised skills work in the department and focus on innovative ways of performing analysis and analytics, but their real job is to put any insight into useful metrics for the organisation. “Data scientists focus on giving a better understanding of audiences – when they watch and how often, etc – to our internal customers,” he says.
It is early days for the department, which was set up last year, but Harris anticipates that soon it will be able to do real-time analysis. “With a heads-up display, analysis will not be restricted to behind the event. Rather than collecting data and analysing it historically, say monthly, we will be able to do real-time analysis of data; that is something we are beginning to experiment with,” he says. This will allow the organisation to interact in a different way, in response to what people are actually doing.
“People can turn up and do something completely different; we will be able to spot that by still leveraging their history, but recognising they are doing something different in real time, so we can change our behaviour and respond,” says Harris.
The latency between capturing and analysing data and providing a useful outcome is shrinking from weeks to days to hours to seconds. “Offering more of the same works for a little while, but spotting how tastes change and offering something different that responds to that change is how we encourage people to spend more time with us,” he says.
Harris gives the example of moving someone who likes comedy to satirical comedy by spotting a change in their tastes. “Real-time prediction and analysis means you can make recommendations and keep audiences engaged by offering content they find interesting,” he says.
The skills within the new department are developing, often in a hands-on way to support such initiatives. “Many of the team are happy to evolve and take on the challenge of new technologies. A good programmer, typically with Linux, Python and Java experience, who is open-minded about using the open-source framework, is a good bet. They also need to understand the difference between analytical reporting and speculative analysis,” says Harris.
The department is a good breeding ground for such skills. “You are not going to find an external organisation with huge amounts of this sort of experience. This is new territory,” he says.
A department was set up specifically to deal with the new critical importance of big data, but Harris says that despite the focus on new big data technologies, traditional relational database technologies will continue to live alongside them.
“There are a lot of people who say either/or for relational databases versus Hadoop, but I prefer to see them as complementary. Our relational databases are great for structured data and we are running them flat-out,” he says.
Capturing more data
With big data’s potential coming to prominence, there is now an appetite to capture and store more data as well as leverage insights from the historical archive of unstructured data.
“The issue with the old reporting approach of, ‘Don’t collect data unless you know what to do with it because it’s expensive to store’, is that you don’t always know what you can do with big data, but it’s impossible to do anything with it if it’s all gone,” says Harris.
This Catch 22 situation has been somewhat addressed by the decreasing price of storage technologies and the cloud.
“We now have the ability to capture and store data at cost-effective prices in the cloud, which coincides with a penchant to capture more. We had up to two years’ worth of data to analyse, but now we are focusing on analytics we are collecting more. It’s about recognising usage patterns and when visitors return,” he says.
The escalation of data, as more is generated and the ability to capture more grows, means Harris is expecting a massive increase in the amount of information Channel 4 holds.
“In 2010, our data warehouse had one terabyte of data, and we only analysed what we needed to run the business. We have 20 terabytes today and expect a 20-fold increase in two years. The projected growth is tens of terabytes per year, because we have not begun to capture all that is available,” he says.
Harris expects many more petabytes of data by 2020.
“With ten-times growth per year, it takes you to some horrible number, but the price of storage is decreasing and our ability to analyse is increasing, and we will get better at knowing what to collect and analyse,” he says.
Big data definition
Harris says at the beginning of his IT career, collecting data was constrained by the cost of storage, compute power and bandwidth, but although this has changed he says the definition of big data that resonates strongest with him is computer scientist Bill Inmon’s definition, which highlights that big data is the set of data that in order of magnitude is ten times more than the data which you can comfortably process today.
“You never get there. For a number of years we will get on the treadmill, but you won’t stop chasing it, because you can’t,” says Harris.
However, Channel 4 will become more discerning and able to learn a lot about its audience.
“We are sifting for pearls. The difficulty of saying you need a firm business case is that you are not allowed to analyse the data until you know the value. But you can’t know the value until you are allowed to analyse the data,” says Harris.
However, he stresses it is still important to justify the business case, and having board-level approval is key.
“It’s like sponsoring a pearl diving expedition – there has to be some risk as with anything speculative in R&D. If you take the view you can’t do anything unless you have a firm business case, it is a difficult place to be in. However, we haven’t got deep pockets, but being able to use cloud technologies and not sink capital cost upfront makes the difference. Also, the CEO has stated that our aspiration is to become a data-centric organisation, which helps a lot,” he says.
Insight into audiences
Harris is excited about the possibilities of using big data to gain insight into audience behaviour and he makes the distinction between more traditional reporting and gaining analytical insight.
“We will be able to look at unusual correlations, such as between people who watch Hollyoaks and Homeland, for example. It becomes possible to explore a gut feeling, go find the data and discover things you didn’t know. Traditional reporting is being able to look in the rear-view mirror, whereas big data analytics gives you a head-up display of what is coming down the road towards you,” he says.
Enabling predictive modelling opens up a different way to do business and many possibilities and opportunities.
“You can investigate what type of programme should have what type of sponsorship. The move towards predictive analysis is really interesting,” says Harris.
To this end, Channel 4 has used Mahout, which is based on machine learning.
“If you can do so at a low cost, you can afford to do some interesting things,” he says.
Influence on content
Ultimately, these growing experiments with big data will affect the way the broadcaster operates – and are not limited to advertising.
“It will affect all areas of the business – it has the potential to start to influence content made or what content we acquire, and will influence the way we interact with people through the website,” says Harris.
Read more on Bob Harris, Channel 4, and big data
“A lot of the analysis we’re doing correlates audience viewing figures with the activities we see on 4oD. Insights gained from big data will influence the site, types of apps built, content and platforms targeted,” he adds.
This is the bounty big data promises to deliver, but Harris says it is only possible if you keep reaching for increasingly unproven technologies.
“If there’s a book about it, like there is about Hadoop, then it is almost mature. You’ve got to track the technology as it evolves. We have looked at Hadoop, Hive, Storm, HBase and Pig. The challenge is the skills are not there and you have to be comfortable working in the open-source community,” he says.
However, Harris appreciates that many organisations are not comfortable with the open-source community, and he believes eventually all organisations will be able to work effectively with big data as the technologies mature.
He concludes: “There are some organisations where the regulatory environment makes it difficult to work in the open-source community, or for other reasons they may decide to wait for the technologies to mature and become commoditised before they reach for it, but big data opens up exciting possibilities for all organisations.”