Reinforcement Learning

Timeline

1947: Monte Carlo Sampling
1958: Perceptron
1959: Temporal Difference Learning
1983: ASE-ALE — the first Actor-Critic algorithm
1986: Backpropagation algorithm
1989: CNNs
1989: Q-Learning
1991: TD-Gammon
1992: REINFORCE
1992: Experience Replay
1994: SARSA
1999: Nvidia invented the GPU
2007: CUDA released
2012: Arcade Learning Environment (ALE)
2013: DQN
2015 Feb: DQN human-level control in Atari
2015 Feb: TRPO
2015 Jun: Generalized Advantage Estimation
2015 Sep: Deep Deterministic Policy Gradient (DDPG)
2015 Sep: DoubleDQN
2015 Nov: DuelingDQN
2015 Nov: Prioritized Experience Replay
2015 Nov: TensorFlow
2016 Feb: A3C
2016 Mar: AlphaGo beats Lee Sedol 4-1
2016 Jun: OpenAI Gym
2016 Jun: Generative Adversarial Imitation Learning (GAIL)
2016 Oct: PyTorch
2017 Mar: Model-Agnostic Meta-Learning (MAML)
2017 Jul: Distributional RL
2017 Jul: PPO
2017 Aug: OpenAI DotA 2 1:1
2017 Aug: Intrinsic Cusiority Module (ICM)
2017 Oct: Rainbow
2017 Oct: AlphaGo Zero masters Go without human knowledge
2017 Dec: AlphaZero masters Go, Chess and Shogi
2018 Jan: Soft Actor-Critic
2018 Feb: IMPALA
2018 Jun: Qt-Opt
2018 Nov: Go-Explore solved Montezuma’s Revenge
2018 Dec: AlphaZero becomes the strongest player in history for chess, Go, and Shogi
2019 Apr: OpenAI Five defeated world champions at DotA 2
2019 May: FTW Quake III Arena Capture the Flag
2019 Aug: AlphaStar: Grandmaster level in StarCraft II
2019 Sep: Emergent Tool Use from Multi-Agent Interaction
2019 Oct: Solving Rubik’s Cube with a Robot Hand
2020 Mar: Agent57 outperforms the standard human benchmark on all 57 Atari games
2020 Nov: AlphaFold for protein folding
2020 Dec: MuZero masters Go, chess, shogi and Atari without rules
2021 Aug: Generally capable agents emerge from open-ended play

Theory

Books

Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction (1st Edition, 1998) [Book] [Code]
Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction (2nd Edition, in progress, 2018) [Book] [Code]
Csaba Szepesvari, Algorithms for Reinforcement Learning [Book]
David Poole and Alan Mackworth, Artificial Intelligence: Foundations of Computational Agents [Book Chapter]
Dimitri P. Bertsekas and John N. Tsitsiklis, Neuro-Dynamic Programming [Book (Amazon)] [Summary]
Mykel J. Kochenderfer, Decision Making Under Uncertainty: Theory and Application [Book (Amazon)]
Deep Reinforcement Learning in Action [Book(Manning)]
REINFORCEMENT LEARNING AND OPTIMAL CONTROL Dimitri P. Bertsekas BOOK, VIDEOLECTURES, AND COURSE MATERIAL, 2019

Surveys

Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moore, Reinforcement Learning: A Survey (JAIR 1996) [Paper]
S. S. Keerthi and B. Ravindran, A Tutorial Survey of Reinforcement Learning (Sadhana 1994) [Paper]
Matthew E. Taylor, Peter Stone, Transfer Learning for Reinforcement Learning Domains: A Survey (JMLR 2009) [Paper]
Jens Kober, J. Andrew Bagnell, Jan Peters, Reinforcement Learning in Robotics, A Survey (IJRR 2013) [Paper]
Michael L. Littman, Reinforcement learning improves behaviour from evaluative feedback (Nature 2015) [Paper]
Marc P. Deisenroth, Gerhard Neumann, Jan Peter, A Survey on Policy Search for Robotics, Foundations and Trends in Robotics (2014) [Book]
Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, Anil Anthony Bharath, A Brief Survey of Deep Reinforcement Learning (IEEE Signal Processing Magazine 2017) [DOI] [Paper]
Benjamin Recht, A Tour of Reinforcement Learning: The View from Continuous Control (Annu. Rev. Control Robot. Auton. Syst. 2019) [DOI]

Libraries

Berkeley Ray RLLib – An open-source library for reinforcement learning that offers both high scalability and a unified API for a variety of applications.
Berkeley Softlearning – A reinforcement learning framework for training maximum entropy policies in continuous domains.
Catalyst – Accelerated DL & RL.
ChainerRL – A deep reinforcement learning library built on top of Chainer.
DeepMind Acme – A research framework for reinforcement learning.
DeepMind OpenSpiel – A collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
DeepMind TRFL – TensorFlow Reinforcement Learning.
DeepRL – Modularized Implementation of Deep RL Algorithms in PyTorch.
DeepX machina – A library for real-world Deep Reinforcement Learning which is built on top of PyTorch.
Facebook ELF – A platform for game research with AlphaGoZero/AlphaZero reimplementation.
Facebook ReAgent – A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)
garage – A toolkit for reproducible reinforcement learning research.
Google Dopamine – A research framework for fast prototyping of reinforcement learning algorithms.
Google TF-Agents – TF-Agents is a library for Reinforcement Learning in TensorFlow.
MAgent – A Platform for Many-agent Reinforcement Learning.
Maze – Application-oriented deep reinforcement learning framework addressing real-world decision problems.
MushroomRL – Python library for Reinforcement Learning experiments.
NervanaSystems coach – Reinforcement Learning Coach by Intel AI Lab.
OpenAI Baselines – High-quality implementations of reinforcement learning algorithms.
pytorch-a2c-ppo-acktr-gail – PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
pytorch-rl – Model-free deep reinforcement learning algorithms implemented in Pytorch.
reaver – A modular deep reinforcement learning framework with a focus on various StarCraft II based tasks.
RLgraph – Modular computation graphs for deep reinforcement learning.
RLkit – Reinforcement learning framework and algorithms implemented in PyTorch.
rlpyt – Reinforcement Learning in PyTorch.
SLM Lab – Modular Deep Reinforcement Learning framework in PyTorch.
Stable Baselines – A fork of OpenAI Baselines, implementations of reinforcement learning algorithms.
TensorForce – A TensorFlow library for applied reinforcement learning.
Tianshou – Tianshou (天授) is a reinforcement learning platform based on pure PyTorch.
UMass Amherst Autonomous Learning Library – A PyTorch library for building deep reinforcement learning agents.
Unity ML-Agents Toolkit – Unity Machine Learning Agents Toolkit.
vel – Bring velocity to deep-learning research.
DI-engine – A generalized decision intelligence engine. It supports various Deep RL algorithms.

Benchmark Results

DeepMind bsuite
OpenAI baselines-results
OpenAI Baselines
OpenAI Spinning Up
ray rl-experiments
rl-baselines-zoo
SLM Lab
vel
What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study
yarlp

Environments

AI2-THOR – A near photo-realistic interactable framework for AI agents.
Animal-AI Olympics – An AI competition with tests inspired by animal cognition.
Berkeley rl-generalization – Modifiable OpenAI Gym environments for studying generalization in RL.
BTGym – Scalable event-driven RL-friendly backtesting library. Build on top of Backtrader with OpenAI Gym environment API.
Carla – Open-source simulator for autonomous driving research.
CuLE – A CUDA port of the Atari Learning Environment (ALE).
Deepdrive – End-to-end simulation for self-driving cars.
DeepMind AndroidEnv – A library for doing RL research on Android devices.
DeepMind DM Control – The DeepMind Control Suite and Package.
DeepMind Lab – A customisable 3D platform for agent-based AI research.
DeepMind pycolab – A highly-customisable gridworld game engine with some batteries included.
DeepMind PySC2 – StarCraft II Learning Environment.
DeepMind RL Unplugged – Benchmarks for Offline Reinforcement Learning.
Facebook EmbodiedQA – Train embodied agents that can answer questions in environments.
Facebook Habitat – A modular high-level library to train embodied AI agents across a variety of tasks, environments, and simulators.
Facebook House3D – A Rich and Realistic 3D Environment.
Facebook natural_rl_environment – natural signal Atari environments, introduced in the paper Natural Environment Benchmarks for Reinforcement Learning.
Google Research Football – An RL environment based on open-source game Gameplay Football.
GVGAI Gym – An OpenAI Gym environment for games written in the Video Game Description Language, including the Generic Video Game Competition framework.
gym-doom – Doom environments based on VizDoom.
gym-duckietown – Self-driving car simulator for the Duckietown universe.
gym-gazebo2 – A toolkit for developing and comparing reinforcement learning algorithms using ROS 2 and Gazebo.
gym-ignition – Experimental OpenAI Gym environments implemented with Ignition Robotics.
gym-idsgame – An Abstract Cyber Security Simulation and Markov Game for OpenAI Gym
gym-super-mario – 32 levels of original Super Mario Bros.
Holodeck – High Fidelity Simulator for Reinforcement Learning and Robotics Research.
home-platform – A platform for artificial agents to learn from vision, audio, semantics, physics, and interaction with objects and other agents, all within a realistic context
ma-gym – A collection of multi agent environments based on OpenAI gym.
mazelab – A customizable framework to create maze and gridworld environments.
Meta-World – An open source robotics benchmark for meta- and multi-task reinforcement learning.
Microsoft AirSim – Open source simulator for autonomous vehicles built on Unreal Engine / Unity, from Microsoft AI & Research.
Microsoft Jericho – A learning environment for man-made Interactive Fiction games.
Microsoft Malmö – A platform for Artificial Intelligence experimentation and research built on top of Minecraft.
Microsoft MazeExplorer – Customisable 3D environment for assessing generalisation in Reinforcement Learning.
Microsoft TextWorld – A text-based game generator and extensible sandbox learning environment for training and testing reinforcement learning (RL) agents.
MineRL – MineRL Competition for Sample Efficient Reinforcement Learning.
MuJoCo – Advanced physics simulation.
OpenAI Coinrun – Code for the environments used in the paper Quantifying Generalization in Reinforcement Learning.
OpenAI Gym Retro – Retro Games in Gym.
OpenAI Gym Soccer – A multiagent domain featuring continuous state and action spaces.
OpenAI Gym – A toolkit for developing and comparing reinforcement learning algorithms.
OpenAI Multi-Agent Particle Environment – A simple multi-agent particle world with a continuous observation and discrete action space, along with some basic simulated physics.
OpenAI Neural MMO – A Massively Multiagent Game Environment.
OpenAI Procgen Benchmark – Procedurally Generated Game-Like Gym Environments.
OpenAI Roboschool – Open-source software for robot simulation, integrated with OpenAI Gym.
OpenAI RoboSumo – A set of competitive multi-agent environments used in the paper Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments.
OpenAI Safety Gym – Tools for accelerating safe exploration research.
Personae – RL & SL Methods and Envs For Quantitative Trading.
Pommerman – A clone of Bomberman built for AI research.
pybullet-gym – Open-source implementations of OpenAI Gym MuJoCo environments for use with the OpenAI Gym Reinforcement Learning Research Platform
PyGame Learning Environment – Reinforcement Learning Environment in Python.
RLBench – A large-scale benchmark and learning environment.
RLTrader – A cryptocurrency trading environment using deep reinforcement learning and OpenAI’s gym.
RoboNet – A Dataset for Large-Scale Multi-Robot Learning.
rocket-lander – SpaceX Falcon 9 Box2D continuous-action simulation with traditional and AI controllers.
Stanford Gibson Environments – Real-World Perception for Embodied Agents.
Stanford osim-rl – Reinforcement learning environments with musculoskeletal models.
Unity ML-Agents Toolkit – Unity Machine Learning Agents Toolkit.
UnityObstableTower – A procedurally generated environment consisting of multiple floors to be solved by a learning agent.
VizDoom – Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information.
RLCard – A research platform for reinforcement learning in card games.
DouZero – A research platform for reinforcement learning in DouDizhu (Chinese poker).

Competitions

AWS DeepRacer League 2019
Flatland Challenge 2019
Kaggle Connect X Competition 2020
NeurIPS 2019: Animal-AI Olympics
NeurIPS 2019: Game of Drones
NeurIPS 2019: Learn to Move – Walk Around
NeurIPS 2019: MineRL Competition
NeurIPS 2019: Reconnaissance Blind Chess
NeurIPS 2019: Robot open-Ended Autonomous Learning
Unity Obstacle Tower Challenge 2019

Applications

Game Playing

BackgammonGerald Tesauro, “TD-Gammon” game play using TD(λ) (ACM 1995) [Paper]
ChessJonathan Baxter, Andrew Tridgell and Lex Weaver, “KnightCap” program using TD(λ) (1999) [arXiv]
ChessMatthew Lai, Giraffe: Using deep reinforcement learning to play chess (2015) [arXiv]
Atari 2600 GamesVolodymyr Mnih, Koray Kavukcuoglu, David Silver et al., Human-level Control through Deep Reinforcement Learning (Nature 2015) [DOI] [Paper] [Code] [Video]
Flappy BirdSarvagya Vaish, Flappy Bird Reinforcement Learning [Video]
MarioKenneth O. Stanley and Risto Miikkulainen, MarI/Olearning to play Mario with evolutionary reinforcement learning using artificial neural networks (Evolutionary Computation 2002) [Paper] [Video]
StarCraft IIOriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki et al., Grandmaster level in StarCraft II using multi-agent reinforcement learning (Nature 2019) [DOI] [Paper] [Video]

Robotics

Nate Kohl and Peter Stone, Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion (ICRA 2004) [Paper]
Petar Kormushev, Sylvain Calinon and Darwin G. Caldwel, Robot Motor SKill Coordination with EM-based Reinforcement Learning (IROS 2010) [Paper] [Video]
Todd Hester, Michael Quinlan, and Peter Stone, Generalized Model Learning for Reinforcement Learning on a Humanoid Robot (ICRA 2010) [Paper] [Video]
George Konidaris, Scott Kuindersma, Roderic Grupen and Andrew Barto, Autonomous Skill Acquisition on a Mobile Manipulator (AAAI 2011) [Paper] [Video]
Marc Peter Deisenroth and Carl Edward Rasmussen,PILCO: A Model-Based and Data-Efficient Approach to Policy Search (ICML 2011) [Paper]
Scott Niekum, Sachin Chitta, Bhaskara Marthi, et al., Incremental Semantically Grounded Learning from Demonstration (RSS 2013) [Paper]
Mark Cutler and Jonathan P. How, Efficient Reinforcement Learning for Robots using Informative Simulated Priors (ICRA 2015) [Paper] [Video]
Antoine Cully, Jeff Clune, Danesh Tarapore and Jean-Baptiste Mouret, Robots that can adapt like animals (Nature 2015) [ArXiv] [Video] [Code]
Konstantinos Chatzilygeroudis, Roberto Rama, Rituraj Kaushik et al, Black-Box Data-efficient Policy Search for Robotics (IROS 2017) [ArXiv] [Video] [Code]
P. Travis Jardine, Michael Kogan, Sidney N. Givigi and Shahram Yousefi, Adaptive predictive control of a differential drive robot tuned with reinforcement learning (Int J Adapt Control Signal Process 2019) [DOI]

Control

Pieter Abbeel, Adam Coates, et al., An Application of Reinforcement Learning to Aerobatic Helicopter Flight (NIPS 2006) [Paper] [Video]
J. Andrew Bagnell and Jeff G. Schneider, Autonomous helicopter control using Reinforcement Learning Policy Search Methods (ICRA 2001) [Paper]

Operations Research

Scott Proper and Prasad Tadepalli, Scaling Average-reward Reinforcement Learning for Product Delivery (AAAI 2004) [Paper]
Naoki Abe, Naval Verma et al., Cross Channel Optimized Marketing by Reinforcement Learning (KDD 2004) [Paper]
Bernd Waschneck, Andre Reichstaller, Lenz Belzner et al., Deep reinforcement learning for semiconductor production scheduling (ASMC 2018) [DOI] [Paper]

Other Resources

Tutorials

Andrew Karpathy Deep Reinforcement Learning: Pong from Pixels
Arthur Juliani Simple Reinforcement Learning in Tensorflow Series
Berkeley Deep Reinforcement Learning Course
David Silver UCL Course on RL 2015
Deep RL Bootcamp 2017
DeepMind UCL Deep RL Course 2018
DeepMind Learning Resources
dennybritz/reinforcement-learning
higgsfield/RL-Adventure-2
higgsfield/RL-Adventure
The Hugging Face Deep Reinforcement Learning Class 🤗
MorvanZhou/Reinforcement Learning Methods and Tutorials
OpenAI Spinning Up
Sergey Levine CS294 Deep Reinforcement Learning Fall 2017
Udacity Deep Reinforcement Learning Nanodegree
Reinforcement Learning Fundamental
PPOxFamily: DRL Tutorial Course

Online Demos

Deep Q-Learning DemoA deep Q learning demonstration using ConvNetJS
Deep Q-Learning with Tensor FlowA deep Q learning demonstration using Google Tensorflow
Reinforcement Learning DemoA reinforcement learning demo using reinforcejs by Andrej Karpathy

Open Source Reinforcement Learning Platforms

OpenAI gymA toolkit for developing and comparing reinforcement learning algorithms
OpenAI universeA software platform for measuring and training an AI’s general intelligence across the world’s supply of games, websites and other applications
DeepMind LabA customisable 3D platform for agent-based AI research
Project MalmoA platform for Artificial Intelligence experimentation and research built on top of Minecraft by Microsoft
ViZDoomDoom-based AI research platform for reinforcement learning from raw visual information
Retro Learning EnvironmentAn AI platform for reinforcement learning based on video game emulators. Currently supports SNES and Sega Genesis. Compatible with OpenAI gym.
torch-twrlA package that enables reinforcement learning in Torch by Twitter
UETorchA Torch plugin for Unreal Engine 4 by Facebook
TorchCraftConnecting Torch to StarCraft
garageA framework for reproducible reinformcement learning research, fully compatible with OpenAI Gym and DeepMind Control Suite (successor to rllab)
TensorForcePractical deep reinforcement learning on TensorFlow with Gitter support and OpenAI Gym/Universe/DeepMind Lab integration.
tf-TRFLA library built on top of TensorFlow that exposes several useful building blocks for implementing Reinforcement Learning agents.
OpenAI labAn experimentation system for Reinforcement Learning using OpenAI Gym, Tensorflow, and Keras.
keras-rlState-of-the art deep reinforcement learning algorithms in Keras designed for compatibility with OpenAI.
BURLAPBrown-UMBC Reinforcement Learning and Planning, a library written in Java
MAgentA Platform for Many-agent Reinforcement Learning.
Ray RLlibRay RLlib is a reinforcement learning library that aims to provide both performance and composability.
SLM LabA research framework for Deep Reinforcement Learning using Unity, OpenAI Gym, PyTorch, Tensorflow.
Unity ML AgentsCreate reinforcement learning environments using the Unity Editor
Intel CoachCoach is a python reinforcement learning research framework containing implementation of many state-of-the-art algorithms.
Microsoft AirSimOpen source simulator based on Unreal Engine for autonomous vehicles from Microsoft AI & Research.
DI-engineDI-engine is a generalized Decision Intelligence engine. It supports most basic deep reinforcement learning (DRL) algorithms, such as DQN, PPO, SAC, and domain-specific algorithms like QMIX in multi-agent RL, GAIL in inverse RL, and RND in exploration problems.
JumanjiA Suite of Industry-Driven Hardware-Accelerated RL Environments written in JAX.

Lectures

[DeepMind x UCL] Reinforcement Learning Lecture Series 2021
[UCL] COMPM050/COMPGI13 Reinforcement Learning by David Silver
[UCL] COMPMI22/COMPGI22Advanced Deep Learning and Reinforcement Learning
[UC Berkeley] CS188 Artificial Intelligence by Pieter Abbeel
Lecture 8: Markov Decision Processes 1
Lecture 9: Markov Decision Processes 2
Lecture 10: Reinforcement Learning 1
Lecture 11: Reinforcement Learning 2
[Udacity (Georgia Tech.)] CS7642 Reinforcement Learning
[Stanford] CS229 Machine LearningLecture 16: Reinforcement Learning by Andrew Ng
[UC Berkeley] Deep RL Bootcamp
[UC Berkeley] CS294 Deep Reinforcement Learning by John Schulman and Pieter Abbeel
[CMU] 10703: Deep Reinforcement Learning and Control, Spring 2017
[MIT] 6.S094: Deep Learning for Self-Driving Cars
Lecture 2: Deep Reinforcement Learning for Motion Planning
[Siraj Raval]: Introduction to AI for Video Games (Reinforcement Learning Video Series)
Introduction to AI for video games
Monte Carlo Prediction
Q learning explained
Solving the basic game of Pong
Actor Critic Algorithms
War Robots
[Mutual Information] Reinforcement Learning Fundamentals
Reinforcement Learning: A Six Part Series
The Bellman Equations, Dynamic Programming, and Generalized Policy Iteration
Monte Carlo And Off-Policy Methods
TD Learning, Sarsa, and Q-Learning