Skip to content
Navallo edited this page Apr 18, 2019 · 1 revision

Potential Improvements

  • angle
    The angle can be fixed to heading to target, potentially more easier for learning
  • reward engineering
    The binary reward like -1 always before robots have done is the ultimate solution, however, it takes forever to train.
  • Random Environments
    to remove overfitting and get a general policy
  • different policies
    such as add the human policy to a part of robots.
Clone this wiki locally