When Aussie Shane Warne dropped England batsman Kevin Pietersen on 15 in the deciding Test Match at the Oval, it was not just a moment of high drama in English cricket. In a back office in Hammersmith, a system was crunching 12,000 transactions a minute as frenzied punters traded positions on the likelihood of the Ashes coming...
Generating this volume of processing activity was an online betting exchange where punters bet against each other, rather than the bookies, and modify their positions during an event. In enabling punters to choose their own odds and to seek a match among the gambling community, Betfair, the exchange owner, has achieved new benchmarks for real-time computing. Just as significant is the bargaining power it has achieved with suppliers as it beds in and tests their latest technologies.
At full capacity, Betfair processes 12,000 bets a minute and the real time matching of trades keeps a community of 350,000 registered punters busy.
This performance has not been achieved as some kind of techie high-five. "Nothing we have ever done technically has been implemented to look good on someone's CV," says David Jack, product engineering director at Betfair. In fact, the first principle behind any technology model that has to satisfy exacting benchmarks of speed, volume and scale is simplicity, says Jack.
Another driver is Betfair's business proposition - impartiality. "Every punter is treated the same, whether they are placing a £10 bet or a £10,000 bet," says Jack. This principle dictates the way that transactions are processed and leads to a straightforward technology model. "It is a question of first come, first served," says Jack. If settlement of trades had to favour the larger deals or preferred clients it would make for greater complexity.
A crucial technology decision was selecting a database that could handle the volume and scale with the business. Early on, Betfair opted for Oracle. "It was the most accessible platform and more scalable too," says Betfair IS director Rorie Devine. IBM's DB2 was a bigger system but called for more customisation and greater engineering cost, and Microsoft's SQL Server was not considered by Betfair to be enterprise-class.
Five years down the line, Betfair's database processes more than five million transactions a day. The company says the move to implement Oracle early on was a structured, well organised decision that has paid dividends. "Our relationship with Oracle has been very important," says Devine.
Betfair has ambitious requirements not only in terms of throughput and scale but also for response times at the user client application end too. "We are one of the most time-sensitive applications on the internet," says Devine. "Where users want to modify their trading position as a horse goes over the last fence in the Grand National, for example, a delay of a split second cannot be tolerated because by the that point the race will be over." Response time is therefore the key internal metric at Betfair.
The benchmark of 99.1% of users being able to place their bet within one second was recently increased to 99.9% - a shift in performance that has a multimillion-pound impact on the bottom line. Any incremental shift translates into more liquidity - the holy grail of any exchange. A half a second decrease in processing time on five million bets placed a day, for example, saves 694 hours, or 28 days, of elapsed time per day. It also frees up funds for punters to recycle on new bets.
Gambling has another unique characteristic that calls for fresh technology - spike intensity. "People are betting on the same horse, on the same position and in the same second. Seventy five per cent of trades in a single race may be on one horse, for example. No other industry has to manage these traffic/performance peaks. Trades on stock exchanges, for example, are spread much more evenly," says Jack.
Betfair handles these usage peaks with surge protection, delivered through Citrix Netscaler network appliances in the middle tier. Netscaler handles application requests with additional compression and load balancing to make sure that no one is turned away during peak traffic periods. "Everything is queued up in the pipeline and, crucially, it means that a user is not served a 404 error message during an England-scores-against-Brazil-moment," says Devine.
Similarly, the Betfair team has to be prepared to re-code to extract minute increases in response time. Multiplied many times over, a nanosecond improvement can have a major financial impact.
For example, it recently had to delve into the source code of the device drivers of network cards to resolve a slowed response time issue. "We had to write some custom software to monitor the problem, and then once it was identified, write software to simulate and test the condition," says Devine.
Although Betfair employs performance enhancement technology and continuously improves its coding to ensure optimum results in a real-time environment, raw horsepower is needed in the engine room too. As soon as they were available, the exchange upgraded its Sun Ultrasparc servers to the latest version, IV, to host its database.
Ultrasparc chip architecture best matches Betfair's requirement for "grunt" and Devine says he harvested a doubling of performance after the upgrade. A failover configuration of two boxes, one in active and one in passive mode, provides 100% redundancy and comfortably meets the benchmarks of processing 500 bets per second.
Like other online exchanges, Betfair prefers smaller units of horsepower to host its application logic because the scalability matches its business growth curve better. "When you look at the non-stop big boxes, there is a huge step change each time you upgrade. In our business, you cannot install and then upgrade another two years down the line because our growth is exponential," says Jack.
However, the IT team is never complacent about performance, and testing is a key activity to ensure that all aspects of the Betfair platform are ready for any eventuality. An in-house laboratory consisting of Avalanche and Reflector products from Spirent is used for testing network appliances, firewalls and switches. The IT team models traffic and loads before big events as no two events on which people bet work in quite the same way.
Once it has a business view and model of likely traffic, it examines specific areas, such as routers or application servers to see what needs beefing up. Prior to the Ashes Test series, the front-end router capacity was increased to accommodate anticipated traffic growth. "Cricket has a huge amount of volatility because so much can change in an over. This provides many opportunities to buy low and sell high," says Devine.
Given the effort Betfair has invested in in-house testing, its decision not to outsource strategic software development does not come as a surprise. In-house, quality assurance is taken seriously, with a software quality person appointed to each development team. Their job is to ensure the quality of every product, follow the same process across all departments and to find new and harder ways to test software.
"We are still a young company but take a very mature approach to testing and deployment," says Jack. Various methods are used by development teams, including extreme programming and waterfall methods, and the company uses the Six Sigma method of constant improvement as a matter of corporate policy.
Six Sigma is a method of measuring continuous improvement and is a good cultural fit with the organisation. "We are results-oriented and take a lot of time to understand how we are performing," says Devine. It matches the personality of the IT department too. "Our people are very mathematically orientated and like anything that involves counting and measuring," he adds.
Despite the testing effort that goes on in the background, being at the bleeding edge is a high-risk position and means problems and anomalies occur. "It happens all the time," says Devine. "We push limits and things happen. It is not a reason to panic, but more a case of asking what can we do to make sure this does not happen this way again?"
Choosing the right supplier
"We have had to create custom support deals and build unique relationships with the world's two biggest IT suppliers" says Rorie Devine, IS director at Betfair. "We expect all our suppliers to address our very individual needs."
As well as the discounts they negotiate in return for "road testing" emerging technologies, Betfair expects fast-track support. "We have some of the best database people in Europe working for us, so by the time we have to escalate a problem to the supplier we need access to the top person," says Devine.
Betfair is in the process of negotiating customised service level and support agreements with all its major suppliers, which it expects to reflect their special relationships. "We like to have direct relationships with supplier development teams," says Devine.
There are lessons here in how to get the most out of your supplier for users in non-bleeding-edge environments, says Devine. "You have to pick your suppliers very carefully, and then treat them straight," he says.
Devine says there is merit in choosing a supplier that shares a similar culture and values. "We are very ambitious. When we talk to anyone, whether Oracle or Nasdaq, they want to engage with us because they can learn from what we have created," he says.
When evaluating a supplier, Devine says:
- Look at their product roadmap: you need to be interested in their plans for the future as well as what they are delivering now.
- Figure out how well they can deliver and support your particular geographic location. What are their local capabilities?
- Look at their financial performance. Will they be around for the long term?