gaming site unlocks big data with Hadoop

Online gaming company replaced its MySQL database with Cloudera’s Hadoop distribution to cope with big data

This article can also be found in the Premium Editorial Download: CW Europe: Technology in the dock, a free online gaming site based in Sweden, redrew its data architecture to cope with big data coming largely from Facebook.

Founded in 2003, the company claims to be the largest casual social gaming site in the world, with tournaments in categories such as puzzle, strategy, word, action, card and sports games. 

Its games, which include Bubble Witch Saga and Candy Crush, have attracted more than 60 million registered users playing more than five billion games per month. has more than 150 games in its portfolio, all of which are free to play. It generates revenues from in-game products such as boosters and extra lives, as well as through advertising.

Mats-Olov Eriksson, director of data warehousing at the company, says it had managed without big data technologies for some time, but the increased data volumes that came with games on Facebook were too much for the MySQL database it had relied on. 

It was okay for one million users per day, but had in the order of 10 times that by the end of 2012.

There was also a need for speed. “If you are in a production environment with MySQL, you need to wait too long. Even adding a column takes time,” he says.

Cloudera is part of the machinery that gives us a competitive edge. We have a system that is very agile when it comes to tracking users

Mats-Olov Eriksson,

Building a Hadoop data warehouse

Eriksson has a background in analytics and data architecture in online environments, from digital marketing to online games. He is responsible for data storage and processing for’s business units and maintaining an environment to facilitate analytics. 

His team of six developers, with five more to come in the next few months, is building a Hadoop data warehouse. 

Eriksson pronounces himself pro-open source and lean, and favours a metadata-driven data warehousing approach. By this, he means “recording data in a less structured way so that we can track more kinds of user interaction with the games”.

After some experimentation, the team implemented Cloudera’s distribution of Hadoop in 2012. The company claims this provides insights on game usage patterns and preferences, along with gaming behaviours, such as when consumers advance or get stuck in certain game stages. 

“We look at such things as the percentage of failed attempts per level, what levels are difficult – but in a good way,” says Eriksson.

More on Hadoop implementations

  • Getting the most from Elastic MapReduce
  • Apache Hadoop spurs hopes but creates hardships for analytics users
  • Wayne Eckerson podcast: Using Hadoop in 'big data' systems can pay off fast
  • Apache Hadoop FAQ for BI professionals

Agile approach to analytics

The analytics team uses a wider variety of tools. QlikView is the reporting tool. It uses Apache Hive to query the data, and R, the open source statistical programming language. The team also uses statistics software from SAS and IBM’s SPSS. 

“There is an intrinsic value in allowing people to experiment a lot. In the short term it is probably not so efficient, but if you have this open and creative environment it is easier to attract talented people who value that. They can try out new tools and experiment, and that is a big part of our success,” says Eriksson.

“Cloudera is part of the machinery that gives us a competitive edge. We have a system that is very agile when it comes to tracking users. We attract players through paid marketing, so we need to know the return on investment of marketing. We need to know everything we can. Without that we would not dare to spend, and that would reduce our growth. We would be blind,” he says.

Eriksson expects to be using Cloudera’s real-time querying technology Impala by the end of this year, but is sceptical about the value of the term "real time". 

“I’m not a big fan of that term,” he says, "but we will get better at feeding our data warehouse at close to real time, and from that the users will get a better experience. So, if there is a problem with a feature in a game, we can adapt more quickly."

Eriksson also resists the seduction of the current vogue for data science. 

“It’s a shame that everyone speaks about data science as if that were the only sexy part about working with data. The maintenance part might not seem as cool, but it is much more important – that is where everything happens," he says.

"[In data-intensive industries] we need more architects who are interested in facilitating other people. Everyone wants to be a statistician. I would love for people to be more interested in facilitating. What is wrong with that?”

Image: Bubble Witch Saga/

Read more on Data warehousing

Data Center
Data Management