Just some boilerplate code for loggers, plots and the like as well as a collection of useful scripts for preparing and launching hyperparameter searches and aggregating results. We use alfred
for machine learning experiments and try to keep it as project-agnostic as possible.
git clone https://github.com/julienroyd/alfred.git
pip install -e .
To make using alfred as seamless as possible, add the followings to your .bachrc
:
alias alprep='python -m alfred.prepare_schedule'
alias allaunch='python -m alfred.launch_schedule'
alias alclean='python -m alfred.clean_interrupted'
alias alsync='python -m alfred.sync_wandb'
alias alcopy='python -m alfred.copy_config'
alias alupdate='python -m alfred.update_config_unique'
├─── alfred
│
| └─── defaults.py
│ └─── clean_interrupted.py
│ └─── copy_config.py
│ └─── launch_schedule.py
│ └─── prepare_schedule.py
│ └─── synch_wandb.py
│
│ └─── schedules_examples
|
| └─── gridSearch_example1
│ └─── grid_schedule_example1.py
| └─── randomSearch_example1
│ └─── random_schedule_example1.py
│
│ └─── utils
|
│ └─── config.py
│ └─── directory_tree.py
│ └─── misc.py
│ └─── recorder.py
This repository contains two different group of files:
- Experiment management scripts directly under
alfred
. They are meant to help manage folder creation, experiment launching and results aggregation. See next section for usage. We refer to them as << alfred's scripts >>. - Common functions for directory trees, loggers, argparsers and the like, located under
alfred.utils
. We refer to them as << alfred's utils >>. - Some important default configurations are defined in
alfred.defaults.py
as global variables and can be overwritten on the ML side by simply reassigning them inside a function calledmain.set_up_alfred()
.
Simply use as any other package, e.g:
from alfred.utils import *
There are some structural requirements that alfred
expects in order to be able to interact with your machine learning codebase. Say my main folder is called my_ml_project
, it should contain:
- a file called
main.py
- a function
main.get_run_args(overwritten_cmd_line)
that defines the hyperparameters for this project - a function
main.main(config, dir_tree, logger)
that launches an experiment with the specified hyperparameters - [OPTIONAL] a function
main.set_up_alfred()
which sets the default values used by alfred (see in alfred/defaults.py)
That being in place, you can use alfred's scripts to prepare, launch and clean these hyperparameter searches. To use any of the scripts, simply call it from my_ml_project
. For example:
python -m alfred.prepare_schedule --schedule_file=schedules/gridSearchExample/grid_schedule_gridSearchExample.py --desc=abc
For a description of their purpose and their arguments, please refer to the help command, e.g:
python -m alfred.prepare_schedule --help
1. Create the search folders:
python -m alfred.prepare_schedule --schedule_file=schedules/benchmarkExample/random_schedule_benchmarkExample.py
--root_dir=scratch/benchmarkExample
--desc benchmarkExample
2. Launch the searches:
python -m alfred.launch_schedule --from_file schedules/benchmarkExample/list_searches_benchmarkExample.txt
--root_dir=scratch/benchmarkExample
The spirit of this codebase is to have project-agnostic scripts launch experiments in parallel and communicate asynchronously through FLAG-files in order to know which experiments are completed, which ones are left to run and which ones have crashed and need to be cleaned-up and re-launched. This framework uses the fact that the directory-tree is known from alfred
(see alfred.utils.directory_tree.py
).
The directory-tree used by alfred is defined in the class alfred.utils.directory_tree.DirectoryTree
. An example of how it could be laid out for a Reinforcement Learning experiment is shown below. Note that all these files would be automatically created either by alfred's scripts
or by my_ml_project
.
├─── root_dir
│
│ └─── Ju1_f7b375e-58332a7_ppo_cartpole_random_benchmarkv1
│ └─── Ju2_f7b375e-58332a7_ppo_mountaincar_random_benchmarkv1
│ └─── Ju3_f7b375e-58332a7_sac_cartpole_random_benchmarkv1
| └─── experiment1
| └─── experiment2
| └─── seed123
| └─── config.json
| └─── config_unique.json
| └─── UNHATCHED
| └─── model.pt
| └─── seed456
| └─── config.json
| └─── config_unique.json
| └─── COMPLETED
| └─── logger.out
| └─── graph.png
| └─── metrics.pkl
| └─── model.pt
| └─── seed789
| └─── config.json
| └─── config_unique.json
| └─── CRASH
| └─── logger.out
| └─── experiment3
| └─── experiment4
| └─── experiment5
| └─── eval_return_over_episodes.png
│ └─── Ju4_f7b375e-58332a7_sac_mountaincar_random_benchmarkv1
The whole directory-tree is a result of alfred.prepare_schedule
. It uses a file defining your search and creates the experiment directories accordingly (see alfred/schedules_examples
for an example of such files).
root_dir
: Root-directory. By default it usesDirectoryTree.default_root
. This default can be overwrited when importing alfred inmy_ml_project/main.set_up_alfred()
, or the--root_dir
can be passed in argument to allalfred's scripts
.Ju1_f7b375e-58332a7_ppo_cartpole_random_benchmarkv3
: Storage-directory. It is composed of:Ju1
: the storage-id (defined automatically from git-username and ordinal numbering)f7b375e-58332a7
: git-hashes of packages being tracked by alfred. These are defined inmy_ml_project
by giving the path to the .git file to alfred in your function main.set_up_alfred(), e.g:alfred.defaults.DEFAULT_DIRECTORY_TREE_GIT_REPOS_TO_TRACK['mlProject'] = str(Path(__file__).absolute().parents[0])
.ppo
: Algorithm-name. Defined in schedule-file andmy_ml_project
.cartpole
: Task-name. Defined in schedule-file andmy_ml_project
.random
: Search-type. Defined inalfred.prepare_schedule
from the providedschedule_file
.benchmarkv1
: Description. Passed as argument toalfred.prepare_schedule
.
experiment1
: Experiment-directory. All leaves of an experiment-dir have the sameconfig.json
except for theseed
.seed123
: Seed-directory. The folder for each particular (unique) run. See it as an egg ready to hatch. These eggs are prepared byalfred.prepare_schedule
, and they will be executed byalfred.launch_schedule
. In this example, we see thatseed123
has not been run yet,seed456
has completed andseed789
has crashed.
There are three main flag files present in seed-directories:
UNHATCHED
: signals that this run has not been launched yetOPENED
: signals that this run has been launched (although it could have stopped say due to ressources being revoked)CRASH
: signals that the run from this config has crashed and contains the error messageCOMPLETED
: signals that this run has reached termination without crash
A seed-directory that does not contain any FLAG-file can be explained in two ways:
- It is currently being runned (a process is executing this config and hasn't finished yet)
- The process running this config has been killed (e.g. by a cluster's slurm system) without having completed its task
Such a seed-directory (containing no FLAG-file) will be identified as OPENED
by alfred.clean_interrupted.py
and will be cleaned to its initial state.