CartPole_v1 A2C

Solution to the CartPole_v1 environment using the Advantage Actor Critic (A2C) algorithm

Code

Running

python Main.py

Dependencies

gym
numpy
tensorflow

Detailed Description

Problem Statement and Environment

The environment is identical to CartPole-v0 but the number of required timesteps is increased to 475.

A2C Algorithm

We need to approximate two separate functions:

Actor's policy
Critic's state value function

Both of them are modeled by a neural network with one hidden layer.

Actor - policy

Policy is a mapping from the state space to a set of probability distributions over action space (in our case only discrete). To each given state we assign a 2-element probability vector whose elements sum up to 1.

Critic - state value function

State value function is a mapping from the state space to the real numbers. To each given state we assign a real number representing a "value/utility" of being at that state.

Training

For actor we are directly optimizing in the policy space. To do that we use the Policy Gradient Theorem and a below TD(0) estimate of the advantage function

This can be reformulated in terms of a cross entropy loss minimization.

The critic is always updated based on the TD(0) back up. This can be reformulated in terms of a squared error loss minimization.

Results and discussion

This method seems to converge no matter what the initialization is. See below an evolution of scores for one run.

Resources and links

- Similar algorithm in Keras and same hyperparameters

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Main.py		Main.py
README.md		README.md
foo.py		foo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CartPole_v1 A2C

Code

Running

Dependencies

Detailed Description

Problem Statement and Environment

A2C Algorithm

Actor - policy

Critic - state value function

Training

Results and discussion

Resources and links

License

About

Releases

Packages

Languages

jankrepl/CartPole-v1_A2C

Folders and files

Latest commit

History

Repository files navigation

CartPole_v1 A2C

Code

Running

Dependencies

Detailed Description

Problem Statement and Environment

A2C Algorithm

Actor - policy

Critic - state value function

Training

Results and discussion

Resources and links

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages