surge

Asynchronous server for collecting offline rollouts in a reinforcement learning setting

Externally, Pytorch models of agent policy functions are are trained using PPO
Models weights are are sent by clients to be cached in the server
Each model version plays multiple matches against all other models
Rollouts of these matches are collected and returned to the clients

A fruitbots clone is used as the game environment in this engine