Reinforcement learning for real world systems

Jacob Zweig
Co-Founder, Principal Data Scientist

What is Reinforcement Learning?

Although our understanding of how humans learn is still in its infancy, it is well-established that we learn at least in part through interacting with the world and observing the consequences of actions. Reinforcement learning allows us to build automated, artificially-intelligent systems that learn in a similar fashion. By taking actions and adapting future decision-making based on the observed consequences of those action, the system can learn to achieve a predetermined goal. Ultimately, they learn to string together sequences of actions that optimize for that goal. These concepts underly some of the most impressive recent achievements in machine learning and artificial intelligence and is an extremely active area of research.

A key benefit of this approach to learning is the ability to solve dynamic multi-step problems where we don’t know the best answer in advance. Instead of traditional supervised learning approaches where an exact label exists for every step, reinforcement learning allows us to evaluate actions based on how they impact our progress towards a particular goal.

Reinforcement learning process
Reinforcement learning is a dynamic process of interacting with the environment, observing outcomes, and learning to improve performance

Applying reinforcement learning to games

Most examples of reinforcement learning applications are focused on games and other toy problems. For example, you may have seen a demo of an algorithm learning to balance a pole on a cart, or even play Flappy Bird and Space Invaders.

While these environments provide a useful test-bed for algorithmic solutions, the challenges of building and deploying reinforcement learning solutions for real world systems at scale requires unique considerations.

Applying reinforcement learning in the real world

Industrial applications of reinforcement learning are vast and include an incredibly wide gamut of problems with substantial impact, including (but not limited to!): 

  • Personalized dynamic recommender systems 
  • Personalized multi-channel marketing
  • Automated ad bidding and buying  
  • Personalized medication dosing  
  • Dynamic resource allocation in wind farms, HVAC systems, and computing clusters
  • Automated calibration of engines and other machines 
  • Robotic control 
  • Supply chain optimization  

Key challenges for RL in industry

Building robust scalable systems to solve these problems requires addressing a number of key challenges not present in toy environments: 

  1. Scale: Enterprise applications need to be able to run often and quickly, with data volume that might include sub-second telemetry or millions of distinct users/customers' event data.
  2. Safety: In video games, errors while training are expected and errors during a test are unfortunate but not mission-critical. In contrast, in industrial applications, systems need to be extremely well-tested to avoid system downtime, embarrassing customer-facing mistakes, or even decisions that could have serious long-term consequences, for example, in healthcare.
  3. Evaluation: A key challenge of building and deploying reinforcement learning applications is evaluating performance. Doing so often requires counterfactual evaluation (i.e., “what would have happened if this new agent was making decisions all along?”), which is extremely challenging to do with certainty.
  4. Varied Learning Environments: In ideal scenarios, we can train our algorithms in realistic simulations that mirror responses in the real-world. In others, we are only able to use observational historical data alone, which often includes confounds between actions and consequences that can be challenging to disentangle. Solutions need to be flexible to learning from varied sources and generalize efficiently to the production environment.
  5. Customization: While reinforcement learning itself is a very general learning algorithm, applying it to any problem requires a deep understanding of the domain. Beating Flappy Bird is going to require a different approach than optimizing operations at a wind farm.

Introducing Strong-RL

At Strong, we have helped top companies across multiple industries address these challenges and implement robust reinforcement learning based applications. The Strong-RL Platform accelerates the process of developing and deploying real world reinforcement learning based systems at scale, bringing these powerful solutions to a wider audience. 

Contact us to find out how you can leverage Strong-RL to accelerate innovation and implement applications that learn, adapt, and improve with AI.