Facebook open sources Reinforcement Learning (RL) software

Everybody’s favourite social media platform company Facebook has — despite the US supreme court apologies and multifarious shenanigans — continued to pump blood through its undeniably vibrant developer division.

This month’s code.fb.com news sees the company open source Horizon.

Horizon is an open source end-to-end platform that uses applied Reinforcement Learning (RL) to optimise systems in large-scale production environments.

Facebook developers David Gauci, Edoardo Conti and Kittipat Virochsirisaid state that the team developed this platform to ‘bridge the gap’ between RL’s growing impact in research and its traditionally narrow range of uses in production.

There is dogfooding here — the developer team has deployed Horizon at Facebook over the past year and they say that this has helped improve the platform’s ability to adapt RL’s decision-based approach to large-scale applications.

What is Reinforcement Learning?

In the most simple terms, RL describes a system that learns to improve over time. When it makes a wrong decision, it learns that this is wrong… and when it makes a correct decision it learns from the positive feedback and further learns how it can take better decisions in the future.

RL is also close to game theory — no surprise that it is used in many instances of game software to alter the behaviour of game characters (enemies) based upon the actions taken by the player him or herself.

The Facebook engineers explain it as follows, “Machine learning (ML) systems typically generate predictions, but then require engineers to transform these predictions into a policy (i.e. a strategy to take actions). RL, on other hand, creates systems that make decisions, take actions and then adapt based on the feedback they receive. This approach has the potential to optimise a set of decisions without requiring a hand-crafted policy. For example, an RL system can directly choose a high or low bit rate for a particular video while the video is playing, based on estimates from other ML systems and the state of the video buffers.”

The team state that while RL’s policy optimisation capabilities have shown promising results in research, it has been difficult for the Artificial Intelligence community to adapt these models to handle a very different set of real-world requirements for production environments.

Horizon takes into account issues specific to production environments, including feature normalisation, distributed training, large-scale deployment and serving, as well as data sets with thousands of varying feature types and distributions and high-dimensional discrete and continuous-action spaces.

A more detailed breakdown from Facebook itself is available here.