Using Reinforcement Learning to Tackle CitiBike Rebalancing Problems and Beyond
In this 2-part article, I want to share my work and reflections on applying Reinforcement Learning (RL) to a large scale city operation problem.
In Part 1 of this article, I focus on insights for business managers:
overview of RL and its current applications
advantages of RL compared other machine learning techniques
business opportunities and considerations
Part 2 is tailored for technical managers and engineers who might be interested in the following topics:
walkthrough a general RL architecture
results of a tailored RL solution for a real-life operation problem
technical limitations of RL and potential workarounds
To conclude, I will share some thoughts on future opportunities of RL-based applications, and bottlenecks and respective workarounds.
Note: In order to manage the scope of this article, I am not able to include any code walkthrough. However, all the codes are available on my Github for reference. The code is under active development, please refer to the README for the latest feature updates.
Special acknowledgement: I want to thank Eric Bogart for providing editorial support and feedback on the content.
Part 1: Overview and Business Opportunities
What is Reinforcement Learning (RL) and its real-world applications?
Many famous scholars in Psychology, Engineering, and Computer Science think RL is the true Artificial Intelligence (AI) . RL solutions are designed based on how humans learn, act, and improve over time through trial-and-errors and reward-punishment feedback from the environments. RL is adaptive to complex and changing environments.
First, let us demonstrate the core concepts using a simple example: a baby learning how to walk.
In this video, although it may not be the most scientific illustration, the baby perfectly (and very enthusiastically) demonstrated the key concepts of RL. In simple terms (we will provide the technical definition in next section), RL consists of the three main components:
Learn: the baby first realized there are lots of joy (reward)being able to high-five daddy or play with his favourite toys; but it was frustrating (punishment) not being able to do so. Now, he knew why he needed to take actions
Act: the baby needed to decide 1) what muscles to use and how to balance his body so he can stand up and walk, 2) where to go so he can get the reward or avoid the punishment, and 3) remember and generalize all the steps so he can maximize the reward every time
Improve: after 11 months of trial-and-error, the baby was able to stand up, walk, and maneuver towards the rewards much faster based on the stronger mental model and mapping to his actions
Here are some powerful real-world applications of the RL idea:
developing a self-driving car navigate in complex city environments (e.g. Uber Self-Driving car)
having a computer program to beat human players without any human instruction (e.g. Alpha Go which beat the top human players in 2015 and 2016)
asking a computer to trade on the stock market by maximizing return and avoiding losing money (e.g. algorithmic trading at Hedge Funds)
automating marketing campaigns to maximize ROI per message by considering the changing customer dynamics and omni-channel response
training a robot to go from A to B by avoiding obstacles using its gymnastic capabilities(e.g. Boston Dynamics)
Some of these use cases may seem very advanced and futuristic. Please refer to the takeaway section where I offer a few simple questions to help translate this to your specific businesses.
So, why does Reinforcement Learning matter?
The main reasons are to achieve large-scale automation that is intelligent and stable in complex environments for productivity gain, expand the frontier of business performance to sustain competitive advantage, and demonstrate just-because-we-can scientific aspiration. But the most important one, which I am very excited about, is autonomous knowledge discovery that can enhance human operations in concrete business settings.
In my opinion, no one can describe it better and with as much passion than David Silver from DeepMind, the lead researcher of AlphaGo.
Here is just a short paraphrase from David's interview:
... it [the RL program] started to learn how to play the game of Go [and any other real world tasks] without human knowledge, examples, interventions, but completely from scratch, on its own. It discovered how to play the game completely from first principles... it recognized the common patterns of the game that human accumulated for thousands of years in a short time period, but also discover knowledge that human might have missed, and ultimately better the way we play the game...
Many people may have already recognized that companies are capturing a large amount and variety of data, building new data management systems and practices, and deploying substantial computational horsepower using cloud services and parallel computing technologies. The collective effort is allowing us to accelerate the adoption of AI and push its envelope at an unprecedented dimensions and speed. The futuristic use cases are becoming reality.
As a business manager, Why should I care now?
Despite the recent hype, Machine Learning has not emerged very recently. In fact, many organizations have deployed traditional machine learning algorithms and power core functions with them today. By contrast, RL is one of the least explored techniques based on a recent McKinsey study of hundreds of use cases . Use cases enabled by RL are therefore the new frontier for competitive advantages for incumbents looking to sustain their edges or for new companies looking to leapfrog. New data types, processing power, and large scale human-AI solutions can allow companies to explore new service capabilities and exploit cost advantages.
How is it different than traditional machine learning?
When we talk about Machine Learning, most people may think of Supervised and Unsupervised Learning (Deep Learning is an algorithm that falls under both categories and is not distinct from them). Predicting customer churn using a Decision Tree, forecasting stock price using a Linear Regression, recognizing handwriting, images, and languages using Deep Learning techniques are common applications of Supervised Learning. In these exercises, we are trying to predict based on known data. For unsupervised learning, we try to discover patterns (e.g. customer segment or data quality check) without prior knowledge using, for example, KNN or PCA.
RL is a different Machine Learning paradigm. RL and traditional machine learning differ in their objectives, input data, outputs, and the roles of human play in the developments. The line is blurry without explaining the nuts and bolts of each machine learning algorithm, so we illustrate the differences based on their characteristics.
It should be noted that Supervised and Unsupervised Learning can be incorporated into RL to enhance the decision and computation performance. To learn more, explore "model-based Reinforcement Learning".
Takeaway for Business Managers: What are the opportunities and limitations of RL solutions?
Because of its adaptability without explicit human instructions, I believe RL is best applied to large scale operations with complex and changing dynamics. Complex and changing dynamics create problems for traditional machine learning models if they are not properly designed or re-trained. As a result, lower accuracy can lead to revenue loss, unexpected costs, and missed opportunities (e.g. targeting the wrong customer with an offer).
Some use cases may seem futuristic at this point, but I believe they will become the new status-quo very soon. Therefore, business managers should translate what RL means in your line of work. More concretely, here are a few questions I would ask:
how might RL solve existing problems that were bottlenecked by data, technology, processes, and investment?
how might RL help to develop new physical and / or digital capabilities?
how might these new capabilities help my business to increase revenue, decrease or maintain cost, and / or manage risk? And do these capabilities justify the investment of time and money?
if you find some ideas through the questions above, how should I build a team, align my business partners, design new processes, and prototype and launch something quickly to test and learn?
Part 2: Technical Use Case and Implementation
In this section, I focus on the technical aspects of RL by explaining an implementation of in operations management.
What is the Problem?
The problem is simple, but also complex due to the scale. CitiBike, the public-private run bike sharing service in NYC, has a common logistics problem: re-balancing bike stocks across the city throughout the day to ensure service availability. Popular destinations tend to have too many bikes; common originations tend to have too little.
According to their performance reports, in some months, they could only achieve 93Ã?Â¢??94% service availability . This is an incredible track record given the scale of operation. But citizens see citiBike as public transportation and hold them to the same standards as buses and subways. Currently, citiBike rebalances bikes three times a day with teams of back-office dispatchers and ground operators.
What are the current solutions, limitations, and my proposed solution?
Many people have produced great studies of rebalancing problems and provided suggestions for further optimization [4, 5]. However, most of the studies focus on exploratory analysis, which come with impressive visualizations and in-depth analysis of ad-hoc situations. These insights are great for business decision makers for initial assessment and inspirations. With this in mind, I see an opportunity to provide a solution for ground-level operations and can power large scale automations.
So, I asked the question from a different angle: can we enhance large scale human operations with autonomous learning capabilities? In particular, I want the solution to be able to:
recognize the operational limits autonomously, such as the maximum number of bikes a station can carry; also adjusting to a new constraint without re-programming
decide and recommend how many bikes to remove from a particular station in the most efficient manner, which is proportional to the number of bikes moved at once
improve its own performance without human intervention and under changing environmental dynamics
With such set up, RL is ideally suited for these goals. For the scope of this article and citiBike use case, my solution is limited to:
balance a single bike station within a day (hour 0 to 23)
maintain the bike stock under 50 by end of the day (hour 23)
remove bikes in a quantity of 0, 1, 3, or 10 at once per hour
the program only communicates with the RL agent via reward and penalty (e.g. 10 reward points if the bike stock is less than 50 at hour 23, otherwise -10 points)
What is the conceptual architecture of the solution?
You can find the codes and more detailed documentations on my Github.
Can the RL agent learn, act, and improve in the bike rebalancing context?
The RL agent was able to learn and recognize the 50 bike stock limit and adjust stock effectively and cheaply after some training.
In the beginning (eps 0), the agent would simply remove as many bikes as possible (orange). After 200 rounds of simulation, the agent removes bikes when it thinks is the most cost-effective (green).
It is not practical to have negative bike stocks. Can we re-train the agent by putting heavy penalties on reaching negative bike stock at any given hour?
The RL agent was able to recognize the new non-negative constraint and act accordingly without explicit human instructions (all the code stayed the same, but only changing the reward structure).
Note: The blue line represents actual bike deposits based on citibike trip data from September 2017
The agent improved over time based on trial-and-error without any human input or re-programming.
You may wonder how the RL agent "think" based on its learning from interacting with the environment? In other words, being able to interpret and trace the decision making is extremely important, especially in the world of black-box AI solutions.
Here is the chart I used to understand how and why the agent made a particular decision:
x-axis: a list of viable actions (# of bikes to move)
y-axis: number of bikes at a given time at the station
values: expected reward if an action is to be taken
Take away for Technical Managers: what are the limitations of RL and potential workarounds?
RL, just like any technology, is not perfect in the following key areas:
Data: bringing up a RL solution requires integration of data from multiple distinct domains and operational environments is critical to represent real-life dynamics well
Human-AI Process: operationalizing a RL solution (not just for the sake of having a fancy solution) requires an organization to be able to physically or digitally carry out the RL actions in a timely manner. This requires well designed human-AI interaction workflow and system integration.
Computation Power: maximizing the potential of a RL solution means the agent should be able to refresh, make decisions, and coordinate with different channels in near real time. Doing so requires tremendous computational power compared to traditional batch-based solutions.
Here are some workarounds:
Start Small: Prioritize a well-define and contained business operation with solid data systems (e.g. just email ads, instead of the full multi-channel promotion operation); this good old principle is timeless.
Simulate: If data becomes a bottleneck, the team can consider creating simulation environment to train the RL agent, then move to real data once the data systems are set up. This requires some domain knowledge of how the business operates.
Use Human-Centered Design: If the solution requires hand-off to human operators, keep the end users in mind and make the interaction intuitive and meaningful; this hand-off will be critical to the actual productivity gain of any AI assisted solution.
Leverage Cloud and Parallel Computing: there are many solutions in the market that offer cheap computation resources, both permanently and on-demand. The team can take this opportunity to modernize its infrastructure to prepare for future AI solutions, which are likely to be computationally intensive.
I hope this article provides an intuitive introduction to both business and technical managers on how RL works and its opportunities and limitations. Traditional Machine Learning and RL are not silver bullets despite recent data science hype. Even so, we will continue to see more commercialization of AI-based solutions based on their successful applications and significant research investment.
AI solutions can have significant impact on the future of human work, which I discussed in another article. We should be thoughtful in designing AI solutions that not only have immediate business impact, but also have long term and balanced benefits for the society. This is not an easy task. I hope all of uspeople who are fo rtunate enough to learn the technical skills, in leadership positions, or have the intellectual curiosity can start thinking, experimenting, and exchanging ideas.
At the moment, I am working with a team of NYU students to improve the solution on the following aspects:
Expand the RL solution to consider and react to a network of bike stations
Develop a cloud-based architecture with parallel computing capabilities to reduce learning time
Deep Learning to remove battle neck in large scale and complex network
Incorporate forecasting model to enhance success rate