--- title: "Cartpole: for newcomers to RL" summary: '' difficulty: 1 # out of 3 ---

The Cartpole environment is one of the simplest MDPs. It is extremely low dimensional, with a four-dimensional observation space and only two actions. The goal of this exercise is to implement several RL algorithm in order to get practical experience with such methods.

The small size and simplicity of this environment makes it is possible to run very quick experiments, which is essential when learning the basics.

Start with a simple linear model (that has only four parameters), and use the sign of the weighted sum to choose between the two actions.

What happens to the above algorithm when the policy is a neural network with tens of thousands of parameters?


Notes

This is a simple task that is meant to help newcomers gain practical experience with implementing simple RL algorithms.

Solutions

Results and some intuition behind the algorithms at this post , and here is the code used.