Bloomberg integrates Learning-to-Rank into Apache Solr

The Computer Weekly Open Source Insider blog takes a closer look at Bloomberg — as previously reported, Bloomberg L.P. produces technology for financial markets. The firm’s software includes tools that can be used to track trading indices, perform financial ‘asset swapping’ functions… and provide broker platforms (Bloomberg Tradebook) that work to perform what is known as multi-asset execution technology and algorithmic trading.

The latest milestone in open source development at Bloomberg is the incorporation of the Learning-to-Rank (LTR) plug-in into the Apache Solr 6.4.0. enterprise search platform.

What is Apache Solr?

Solr is built on top of the Apache Lucene search engine library and and provides distributed search and index replication — it powers the search and navigation features of many of the world’s largest Internet sites. With the Learning To Rank (or LTR for short) contrib module users can configure and run machine learned ranking models in Solr.

The release of this plug-in marks the culmination of a year’s worth of close collaboration between two groups of Bloomberg software engineers in London, New York and the open source project’s community to make it easier to re-rank search results using machine learning.

The original goal was to improve both Federated Search and News Search on the Bloomberg Terminal. A Solr-based Search-as-a-Service platform drives search for multiple functions on the Terminal and Learning-to-Rank algorithms are responsible for the quality of many of its search results. Any time users perform a search, they expect to instantly find the most relevant companies, people and news.

In New York, the re-ranking requirements of the News Search team were different, but similar. As the engineers talked with colleagues, other teams also came forward asking for their own Solr-based re-ranking frameworks.

How does the Learning-to-Rank plug-in work?

In the Information Retrieval field, Learning-to-Rank techniques are used to improve the relevance of users’ search results. First, a search query is made for documents that match the user’s search terms. The top N results of the original search query are then re-ranked using new scores computed by applying the trained machine learning model.

Since these machine learning queries are more computationally intensive—slow and expensive, in other words—using the ranking from the second query on just a subset of results helps improve performance, while delivering relevant results.

The effort to integrate the Learning-to-Rank plug-in into the upstream project was led by Apache Lucene/Solr committer Christine Poerschke, a senior software engineer in the News Search team in London. Last month, Poerschke was named to the Apache Lucene Project Management Committee (PMC), becoming the first Bloomberg employee to be invited to join any Apache PMC. In this new role, she is part of a group of developers around the globe that provides oversight of the project for the Apache Software Foundation (ASF), decides the release strategy, appoints new committers and sets community and technical direction for their project.

Benefits for engineers

After a year-long period of on-and-off iterative code revisions, public comments and documentation, the Learning-to-Rank plug-in is now part of the Solr 6.4.0 release. The plug-in provides an easy-to-use framework to deploy machine learning models into Solr. Now search engineers, both inside and outside Bloomberg, can use the plug-in and their own machine learning models to improve their search solutions. This allows engineering teams to focus on their specific domain, rather than spend time building and maintaining their own re-ranking infrastructure.

With the inclusion of the Learning-to-Rank plug-in as part of Solr, the project’s worldwide community has taken on the responsibility for maintaining and extending this technology. This collaborative open development means that, in the future, the community – which includes several Bloomberg engineers who are active contributors, developers at other companies, as well as independent search experts – will be able to integrate their own extensions and improvements to the plug-in. Those updates will then automatically ship to all Learning-to-Rank plug-in users as part of future Solr releases.

The opportunity for Bloomberg engineers to participate in important and interesting open source projects also (it is argued) has other benefits. Search results ranking is a relatively difficult technical problem. Taking on, and then contributing the results of, this kind of challenge is validating and rewarding for Bloomberg’s engineers and it is also of interest to many prospective Bloomberg engineers.