Here’s a safe sporting bet: take any roomful of fans of, say, cricket or baseball, and you can guarantee that there will be at least one person there with an encyclopaedic knowledge of the sport’s history, including players’ highest scores, batting averages and strike rates. There’s something about sport that attracts the anoraks. But here’s another sure-fire bet: knowing about past performance is about to become old hat. The smart money now is on using data to predict future sporting outcomes. Sports analytics promises to be the bookmaker’s worst nightmare.
One high-profile sport to catch the data analysis bug is football. Leading European clubs such as Real Madrid and Arsenal have pioneered the use of player-tracking systems, such as the IP camera network and analytics software developed by Prozone, to understand how individual players move through every passage of play, looking to find ways to improve their performance. In addition, the Sunday sports pages are now chock-full of statistics and graphical analysis generated by data services provider Opta, detailing players’ every move, pass or misplaced tackle.
But such approaches are merely the tip of the sports analytics iceberg.
Over in the well-heeled corner of South West London, where the All England Club holds the Wimbledon tennis tournament, a data revolution is taking place.
The spectators crammed around Centre Court or watching on a large video screen out on “Murray Mound,” the adopted name for a small hill on the club’s grounds, have grown accustomed to ball tracking technology that can detect the pace at which a serve was fired down or whether a shot ruled “out” grazed enough of the tramline to be called “in.” But at this year’s tournament, IBM showed off a new system that aims to tell you, well before a match’s denouement, which player is likely to prevail.
The system, called SlamTracker, has been featured at tennis tournaments before, but it has been enhanced to include IBM’s SPSS predictive analytics tools, said Jeremy Shaw, a business analytics consultant at IBM. SlamTracker uses around 39 million data points gleaned from seven years of Grand Slam tennis matches to determine players’ patterns of play -- their propensity for using their forehand, their first serve percentage, their willingness to volley. The historical data is then compared with footage from three-dimensional cameras dotted around the courts that show how players are performing in a live match, enabling SlamTracker to identify the three critical aspects of play that will determine the winner of the match, according to IBM.
Sports a proving ground for predictive analytics
In many ways, sports is the perfect testing ground for “big data” and predictive analytics technologies, said Tony Baer, an analyst with industry watcher Ovum. “It’s a closed system; sports have well-defined rules, and we know the outcomes we are looking for,” he said. “So if we can provide enough data points, then fundamentally, we can reduce sport to a classic computer science problem of analysing data for trends.”
Even so, there’s a world of difference between old-school data analysis problems that may have relied on a meticulously created data warehouse and today’s fast-moving sports analytics and predictive tools.
In part, the new domain of sports analysis has been built on technological advances such as multicore processors and commodity network-attached storage devices that have made it feasible to contemplate such data-intensive pastimes, Baer said. “Software has also helped,” he added, saying that the development of the Hadoop distributed computing framework, MapReduce programming model and high-performance databases such as the new breed of NoSQL products “has provided the tools to conduct these big data efforts.”
A prime example of how these technological leaps are affecting sports teams can be seen at Formula One racing team McLaren. The makers of race cars have long had a gimlet-eyed focus on squeezing every last drop of performance improvement possible out of car design. But increasingly, this is accompanied by a fanatical dedication to tracking what is going on out on the track in the white heat of a Grand Prix race.
McLaren’s cars send a torrent of data back to the pit teams; the information is analysed in real time using SAP’s HANA in-memory technology. HANA employs data compression technology, which enables McLaren to store the data in random-access memory, ensuring that it can be analysed in the blink of an eye -- and as a result, the team can act on the incoming data in time to make race-changing adjustments.
Tennis and Formula One racing represent the first generation of what might be possible with real-time data analysis, said IBM’s Shaw. In those sports, the power of the analytics is concentrated on individual entities -- cars or players. “The problem becomes exponentially more difficult,” he said, “when you look at team sports, such as football, where there are 11 players on each side, each of whom can have an impact on the outcome.”
The key to being able to generate insight from all the data being collected is having people with domain expertise who can provide the context for that data. “At the moment, everyone has different views on what works in football,” Shaw said.
So for the short term, at least, it looks like the bookies might still have some sports where people will be willing to take a punt on the outcome.