Deep Reinforcement Learning requires good amount of intuation in both deep learning and reinforcement learning. Even though, theorical part is not that hard to understand, it definitely makes it harder to understand the messy codes. Lets try to change that :D
Since we are working with OpenAI's GYM alot, lets have better intuation about it!
In supervised learning, it is simple to create a system that can easily map inputs X into outputs Y since there is a dataset which contains all input and output examples. On the other hand in Reinforcement learning, there are no datasets which contain examples just like datasets in supervised learning. Using Policy gradient is one way to solve this problem. The hole idea relly on encouraging the actions with good reward and discouraging the actions with bad reward. The general formula is minimizing the log(p(y | x)) A loss. In here A represent Adventage and for most vanilla version we can use discounted rewards.
-
Pythorch Coming Soon
Extra resources:
- My blog post
- Pong from pixels
- Hands-On Machine Learning with Scikit-Learn and TensorFlow Chapter 16
- Policy Gradients Pieter Abbeel lecture
- Continuous control with deep reinforcement learning
- Better Exploration with Parameter Noise
Before we start with DQN lets talk about Q function first. Q(s,a) is a function that maps given s (state) and a(action) pair to expected total reward untile the terminal state. It is basicaly how much reward we are gonna gate if we act with action a in state s. The reason we combine this idea with NN is it is almost imposible to find q values for all states in environment.