monsitj - stock.adobe.com
In this special edition of the Computer Weekly Downtime Upload podcast, Bloomberg CTO Shawn Edwards discusses real-time data and open source
In the last 19 years, Shawn Edwards, CTO at Bloomberg, has seen the industry change radically. When asked about the biggest change that has impacted businesses, Edwards says: “When I think about my 19 years at Bloomberg, I think there have been a lot of trends. I like to joke and say the tech industry has more trends than the fashion industry.”
While there are more fads, Edwards says some trends have endured. One of these is web-scale, high-performance, open source software. When he started working for Bloomberg, Edwards says the company did not have an open source policy. As a consequence, the developer team at Bloomberg had to build almost everything itself. “There were a few vendor-built products here and there,” he says. “But largely we built everything ourselves.” But at the time, some really high-quality open source software was starting to appear, says Edwards.
“For us, open source software makes sense in so many dimensions and I led the effort and had to convince a lot of people that we should be an open source-first company instead of automatically thinking we had to build everything,” he says. “Let’s look out there and see if there’s something close enough, something that we can leverage with thousands and thousands of people out there who are contributing to the project.”
Such projects are an ecosystem, he says, which makes sense both financially and economically to those organisations that want to use open source. However, the real benefit of open source software, according to Edwards, is that it accelerates time to market.
The shift to open source at Bloomberg began with the creation of an open source office. Asked why this was necessary, Edwards says: “I had to create an open source office because you have to work with open source very carefully. You just can’t pull out anything. There are some open source software we shouldn’t use and we have to be really judicious about what to do, where and what to use.”
Looking at what projects to avoid, Edwards says: “It really comes down to what you’re trying to achieve.” Besides the discussions on the different types of licensing, he says: “When you’re looking at a particular project, is the technology something that will be too intrusive? Will it force you to rewrite your own code? Does the open source code have an ecosystem or does it offer good APIs so that it can be used in conjunction with other systems that you’ve already built?”
Beyond these questions, he says organisations wishing to use open source projects should also assess whether the project is being actively developed. “You don’t want a dead-end project. You don’t want a dead end.” Another factor to assess is whether the community is moving away from the project. “Is there a vibrant community that is working well and not mired in politics and other things?” he says.
Bloomberg is also an active contributor to a number of open source projects. One of these is Apache Solr, which provides enterprise search. Edwards says sophisticated ranking algorithms developed by Bloomberg have been adopted by the project.
Bloomberg also has staff members on another open source project, JupyterLab, and has paid companies to work on the project. “I think this is an example of what the open source community does best,” says Edwards, “bringing people together, working on common problems, giving back and then extending its capabilities.” He says Bloomberg uses Jupyter Labs internally and contributing to the project “felt like the right thing to do”.
Data at scale
For Edwards, Bloomberg is a data company and the way it gathers, processes and distributes data presents unique challenges to the open source community. “Data is the lifeblood of what we do at Bloomberg,” he says. The journey starts with the data feeds and web scraping. Then, says Edwards, Bloomberg needs to analyse “a heterogeneous collection of large messy data and make sense of it, digitise it and give it to our customers in a consumable fashion that makes sense to them”.
While the webscale companies have pioneered algorithms to process largely unstructured data, Edwards says data processing at Bloomberg involves combining structured and unstructured information. All of these data sources need to be tied together to provide analytics and enable Bloomberg’s customers to ask pertinent questions.
During a panel discussion at Columbia University a few years ago year, Edwards spoke to a fellow panellist from Twitter. “We were talking about large-scale, big data real-time systems. And at the time, he was mentioning Twitter handled about 500 million tweets a day. At Bloomberg, we have, at this point, 300 billion messages or ticks a day.”
While the size of the data messages is about the same, Edwards says Bloomberg’s systems process mainly structured data, which need to be ingested, normalised and processed. Then there is data analytics and finally the data has to be distributed worldwide over what Edwards says is one of the world’s largest private global networks. “When you think about this type of system, it is unique,” he adds. “There are few people who deal with this scale.”