This repository is the official implementation of Safe Experimentation in Reinforcement Learning through Pilot Experimentation.
In this section, we describe how to install the repository on a Linux machine. We assume that Python 3 and C++ are installed.
First, clone the repository using the --recursive
option. The option is required to, in addition, clone the two Git submodules gym-cellular
and prism
1.
The Git submodule prism
uses Dave Parker's imc2 branch of the Prism model-checker.
Note that running the code in this repository requires a version of the Prism model-checker that can verify interval MDPs.
At the time of writing, this is not included in standard installations of the Prism model-checker, and therefore, it is necessary to install it from the submodule.
Before installing the Prism model-checker, it is required to have the Java Development Kit (JDK) installed.
To check whether it is installed use the command javac --version
.
(Note that it is not sufficient to merely have the Java Runtime Environment [JRE] installed.)
If it is not installed, it can be downloaded from Oracle's website, and an installation guide can be found here.
To install this version of the Prism model-checker, go to the directory pilot-experimentation/prism/prism
.
Run the command make
.
To test the code after installation, run make test
.
For additional installation options, consult the imc2 branch or prismmodelchecker.org.
Below, we list the necessary Python modules. We suggest installing them in a virtual environment2.
argparse
gymnasium
importlib
matplotlib
pandas
psutil
seaborn
time
typing
tqdm
The modules numpy
and setuptools
are also required but are typically installed by default3.
A final Python module requires special attention.
gym-cellular
This Python module is our own and is what is implemented in the Git submodule gym-cellular
. It can be installed with pip3 install -e gym-cellular
from the pilot-experimentation
directory.
The directory configs
contains the configuration files for the training runs.
(The files necessary to reproduce the experiments in Safe Exploration in Reinforcement Learning through Pilot Experimentation are deadlock.py
and reset.py
.)
The subdirectory configs\envs
contains the configurations for the environments.
(The files necessary to reproduce the experiments in Safe Exploration in Reinforcement Learning through Pilot Experimentation are deadlock_set.py
and reset_set.py
.)
The seeds can be changed for both the agent and (under configs\envs
) the environment. If they are set to None
the seeds are set as a function of the time.
To start a training run (or set of training runs), from the directory pilot-experimentation
, use the command
python3 train.py <filename>
where <filename>
is a configuration file in the directory configs
.
Data from the training run is saved in the directory results
.
We suggest to detach the terminal while running the training, e.g. by using the command screen
.
(To reproduce the experiments from Safe Exploration in Reinforcement Learning through Pilot Experimentation, use the commands python3 train.py deadlock
and python3 train.py reset
respectively. Note that these use an upper bound of 50 cores in their parallelisations.)
To plot the results (suggested for longer training runs), from the directory pilot-experimentation
, use the command
python3 eval.py <dir>
where <dir>
a subdirectory in results
.
(To reproduce the plots from Safe Exploration in Reinforcement Learning through Pilot Experimentation, use the commands python3 eval.py deadlock --style paper --title "deadlock variant"
and python3 eval.py deadlock --style paper --title "reset variant"
respectively.)
Error regarding Process().cpu_num()
. This may occur on some machines, e.g. Mac OS. Edit agents/peucrl.py
. Replace cpu_id = Process().cpu_num()
with cpu_id = 1
. This will not change performance of the algorithm as cpu_id
is only there for debugging purposes.
Footnotes
-
For example, use the command
git clone --recursive https://github.com/carlhenrikrolf/pilot-experimentation.git
. ↩ -
A virtual environment can be created with the command
python -m venv <path/to/new/virtual/environment>
. It can be activated usingsource </path/to/new/virtual/environment>/bin/activate
. Finally, modules can be installed withpip3 install <module>
. ↩ -
Modules such as
concurrent
,copy
,os
,pickle
,subprocess
are also used but do not need to be installed. ↩