BillionPhotos.com - Fotolia
William Hill has developed a new data platform based on open-source toolkits, functional programming and Docker containers to support recommendations and gamification. Its technology stack gives a glimpse of the benefits of developing an open-source website architecture.
Five years ago, analyst Gartner used to get about 12 inquiries a year about open-source databases. Last year, it received 110, according to Open Source: the new database standard paper from EnterpriseDB.
The numbers may be small, but they illustrate growing acceptance of open source among enterprise CIOs.
Lower software licence costs used to be a big driver for selecting open-source software (OSS), especially in a business where web traffic peaks were unpredictable, since there is no need to pay per server, per core or per user licence fees.
But, given that some open-source tools are created by hugely successful websites, the technology proves it can work at scale to solve the problems inherent in webscale businesses.
For instance, LinkedIn’s engineering team developed and built Apache Kafka as the messaging backbone that helps the company’s applications work together in a loosely coupled manner.
In a blog posted earlier this year, LinkedIn’s head of engineering, Mammad Zadeh, wrote: "LinkedIn relies heavily on the scalability and reliability of Kafka and a surrounding ecosystem of both open source and internal components. We are continuing to invest in Kafka to ensure that our messaging backbone stays healthy as we ask more and more from it."
Computer Weekly recently spoke to betting and gaming firm William Hill about a new open source-powered platform that will be used to drive new forms of customer engagement.
Developed out of an innovation exercise at William Hill's Shoreditch office, the project provides a platform called Omni for real-time data and business analytics.
In an interview with Computer Weekly, Patrick Di Loreto, research and development lead at William Hill, said the online betting site wanted to combine data from external sources such as Twitter with a profile of the punter.
"If you tweet about Liverpool and five seconds later you come to our site, we already know what you are interested in," said Di Loreto.
William Hill’s new recommendation engine is the first application built for Omni. "When someone places a bet, we have a real-time graph so we can see who has placed a bet, and combine this information to present which bets other people have placed," said Di Loreto.
Building on open source
The betting site has used Apache Kakfa to track the navigation of customers, for buffering the data. It provides data to a huge number of subsystems within William Hill’s Omni architecture, to enable information to be consumed as quickly as possible, which the company needed for its recommendation engine.
"All the information we collect is stored in Kafka, where it is then distributed," said Di Loreto. "The data is sent to two layers – a speed layer and a batch layer. The speed layer is used to calculate 'best bets'. The data is also sent to the batch layer, which is used for long-term business intelligence to create a better profile of the user."
The Omni platform was built using a Java-like programming language called Scala. It was selected because it offers a functional programming model, which Di Loreto said better suited the highly distributed nature of William Hill's platform. "We felt it would let us handle all the data wrangling and logic required to manage the data streams on our users," he added.
Di Loreto said Scala offers a different paradigm of programming, enabling developers to use object-oriented programming for web services and Soap, interpretative programming and also functional programming, where appropriate.
Read more about open source in the enterprise
- A mathematical programming concept that dates back a decade before the world’s first electronic computer is proving itself in the era of webscale computing.
- A study has found that commercial code is more compliant than open-source code with security compliance standards.
The open-source Apache Spark general engine for big data processing from Berkeley University’s Algorithms, Machines and People (AMP) lab was used on top of Scala to handle complex, associative data queries. Spark Streaming, which is built into Spark, enables William Hill to handle hundreds of thousands of data operations distributed across multiple processors.
Information on users is organised into timelines stored in the Cassandra open-source distributed database management system.
Di Loreto said William Hill also used the open-source Akka toolkit, which provides a transparent processing framework for all requests, to enable the Spark Streaming real-time data processing tool to process the data feed from Kafta.
With this architecture, Di Loreto said William Hill has used logical reasoning to present relevant content to users. "Because you always place a bet on Liverpool, then you are likely to be a Liverpool supporter," he said.
The site has also been designed for failure using the open-source Docker container technology, which provides a more lightweight way to run applications compared with a virtual machine. Di Loreto said: "If an application crashes in a virtual machine, it could corrupt the whole VM, including all other applications. But if you have a failure in a containerised Docker application, there is no impact on the rest of the system."
Why choose open source?
Among the benefits of OSS is that it is hardly ever a standalone product. Most OSS is built on other open-source projects. Because of the way it is licensed, these enhancements are then passed back to the open-source community, so the software constantly evolves.
So, if such open-source technology is readily available, and has proved its scalability in webscale businesses, why reinvent the wheel?
Open source is certainly more accepted in the enterprise, said Tony Lock, distinguished analyst at Freeform Dynamics. “It is suitable for all businesses, not just for webscale businesses.”