Temporal Difference Learning - the holy amalgamation of Monte Carlo and dynamic programming. Taking the best of both worlds, TD learning is a faster, model-less, more accurate method of solving reinforcement learning problems. TD is a concept that was first developed in reinforcement learning, and only later branched to other...
[Read More]
Roulette - To Play or Not to Play
RL Basics
Previously, we looked into the Monte Carlo family of reinforcement learning algorithms. These methods estimate the solution to a RL problem by learning from experience. Just by simulation and blind exploration, Monte Carlo agent reaches the optimal solution.
[Read More]
Monte Carlo - Learning from Experience
RL Basics
Last time we looked at dynamic programming, methods that compute near-exact solutions to problems using a full-scale model of the environment dynamics. This time we are doing a complete 180. Monte Carlo(MC) methods are everything DP is not.
[Read More]
Dynamic Programming RL
RL Basics
Dynamic Programming(DP) in reinforcement learning refers to a set of algorithms that can be used to compute optimal policies when the agent knows everything about its surroundings; i.e. the agent has a perfect model of the environment. Although dynamic programming has a large number of drawbacks, it is the precursor...
[Read More]
The OpenAI Gym
RL Basics
Is this working out?
[Read More]