From a269db59132c234245a7b606bdd4741ad649da37 Mon Sep 17 00:00:00 2001 From: Farah <49493059+salhanyf@users.noreply.github.com> Date: Thu, 17 Oct 2024 13:02:33 -0400 Subject: [PATCH 1/2] Updating README and OpenISS script --- src/README.md | 56 +++++++++++++++++++++------------------ src/openiss-reid-speed.sh | 22 +++++++-------- 2 files changed, 41 insertions(+), 37 deletions(-) diff --git a/src/README.md b/src/README.md index d040663..8135787 100644 --- a/src/README.md +++ b/src/README.md @@ -373,49 +373,53 @@ Time is in minutes, run Yolo with different hardware configurations GPU types V1 -## OpenISS-reid-tfk +## OpenISS Person Re-Identification Baseline -The following steps will provide the information required to execute the *OpenISS Person Re-Identification Baseline* Project (https://github.com/NAG-DevOps/openiss-reid-tfk) on *SPEED* +The following are the steps required to run the *OpenISS Person Re-Identification Baseline* Project (https://github.com/NAG-DevOps/openiss-reid-tfk) on the *Speed* cluster. This implementatoin is based on tensorflow and keras - -### Environment + +### Prerequisites -The pre-requisites to prepare the environment are located in `environment.yml` (https://github.com/NAG-DevOps/openiss-reid-tfk). +#### Dataset + Using the Market1501 dataset which consist of + - Train images: 12,936 + - Query images: 3,368 + - Gallery images: 15,913 -Using a test dataset (Market1501) and 120 epochs as an example, we ran the script and the results were the following: + Running for 10 epochs as an example, the results for different Speed configurations were: + - Using GPU: 29 minute + - Using CPUs (32 cores): 6 hours and 49 minute -Speed 1 GPU: 5hrs 25min +#### Environment Setup + The environment setup instructions are located in `environment.yml` (https://github.com/NAG-DevOps/openiss-reid-tfk). Ensure all dependencies are correctly installed. -Speed CPU - 32 cores: 2 days 22 hours + +### Configuration and execution -TEST DATASET: Market1501 +- Log into Speed and navigate to your speed-scratch directory: + + ssh $USER@speed.encs.concordia.ca + cd /speed-scratch/$USER/ ----- Train images: 12936 +- Clone the GitHub repo from https://github.com/NAG-DevOps/openiss-reid-tfk ----- Query images: 3368 +- Download the dataset: Navigate to the `datasets/` directory, make the script executable, and run `get_dataset_market1501.sh`: ----- Gallery images: 15913 + chmod u+x *.sh && ./get_dataset_market1501.sh - -### Configuration and execution +- Download `openiss-reid-speed.sh` execution script from this repository. -- Log into Speed, go to your speed-scratch directory: `cd /speed-scratch/$USER/` -- Clone the repo from https://github.com/NAG-DevOps/openiss-reid-tfk -- Download the dataset: go to `datasets/` and run `get_dataset_market1501.sh` -- In `reid.py` set the epochs (`g_epochs=120` by default) -- Download `openiss-reid-speed.sh` from this repository -- On `environment.yml` comment or uncomment tensorflow accordingly (for CPU or GPU, GPU is default) -- On `openiss-reid-speed.sh` comment or uncomment the resourse allocation section accordingly (GPU is default), make sure you only request CPU or GPU but not both -- Submit the job: +- In `reid.py` set the number of epochs (`g_epochs=120` by default) - On CPUs nodes: `sbatch ./openiss-reid-speed.sh` +- In `environment.yml` comment/uncomment the TensorFlow section depending on whether you are running on CPU or GPU. GPU is enabled by default. - On GPUs nodes: `sbatch -p pg ./openiss-reid-speed.sh` +- In `openiss-reid-speed.sh` comment/uncomment the resource allocation lines for either CPU or GPU, depending on the target node (GPU is default). Ensure that only one type (CPU or GPU) is requested. -**IMPORTANT** +- Submit the job: -Modify the script `openiss-reid-speed.sh` to setup the job to be ready for CPUs or GPUs nodes; `--mem=` and `gpus=` in particular, see more information about these parameters on https://github.com/NAG-DevOps/speed-hpc/blob/master/doc/speed-manual.pdf + For CPU nodes: `sbatch ./openiss-reid-speed.sh` + For GPU nodes: `sbatch -p pg ./openiss-reid-speed.sh` ## CUDA diff --git a/src/openiss-reid-speed.sh b/src/openiss-reid-speed.sh index cbad29d..d45942b 100755 --- a/src/openiss-reid-speed.sh +++ b/src/openiss-reid-speed.sh @@ -1,30 +1,30 @@ #!/encs/bin/tcsh -# Give job a name -#SBATCH -J openiss-reid +# Job name +#SBATCH --job-name openiss-reid -# Send an email when the job starts, finishes or if it is aborted. +# Recieve email notifications when the job starts, finishes or fails. #SBATCH --mail-type=ALL -# Specify the output file name -#SBATCH -o openiss-reid-tfk.log - # Set output directory to current #SBATCH --chdir=./ +# Specify the output file name +#SBATCH -o openiss-reid-output-%A.log + # Request Memory -#SBATCH --mem=32G +#SBATCH --mem=20G # Request CPU - comment this section if the job needs GPUs -##SBATCH -n 32 +##SBATCH -c 32 # Request GPU - comment this section if the job needs CPUs and uncomment the previous section #SBATCH --gpus=1 # Execute the script module load anaconda3/2023.03/default -conda env create -f environment.yml -p /speed-scratch/$USER/reid-venv -conda activate /speed-scratch/$USER/reid-venv +conda env create -f environment.yml -p ../reid-venv +conda activate ../reid-venv srun python reid.py conda deactivate -conda env remove -p /speed-scratch/$USER/reid-venv +conda env remove -p ../reid-venv From a6159aee682b4147592f0f8c53c50a499520dcba Mon Sep 17 00:00:00 2001 From: Farah <49493059+salhanyf@users.noreply.github.com> Date: Mon, 21 Oct 2024 10:39:45 -0400 Subject: [PATCH 2/2] changes are implemented --- src/openiss-reid-speed.sh | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/openiss-reid-speed.sh b/src/openiss-reid-speed.sh index d45942b..efe7ec3 100755 --- a/src/openiss-reid-speed.sh +++ b/src/openiss-reid-speed.sh @@ -23,8 +23,8 @@ # Execute the script module load anaconda3/2023.03/default -conda env create -f environment.yml -p ../reid-venv -conda activate ../reid-venv +conda env create -f environment.yml -p /speed-scratch/$USER/reid-venv +conda activate /speed-scratch/$USER/reid-venv srun python reid.py conda deactivate -conda env remove -p ../reid-venv +conda env remove -p /speed-scratch/$USER/reid-venv