Home

Potential Improvements

angle
The angle can be fixed to heading to target, potentially more easier for learning
reward engineering
The binary reward like -1 always before robots have done is the ultimate solution, however, it takes forever to train.
Random Environments
to remove overfitting and get a general policy
different policies
such as add the human policy to a part of robots.

Provide feedback