Connect to CARC Discovery Node with SSH (i.e. PuTTY). Replace username
with your USC username (i.e. adiyer
or alalim
):
- Host Name (or IP):
[email protected]
- Port: 22
- Note: If possible, set seconds between keepalives to 60 to ping every 60s to maintain a long-lasting connection.
To simplify the entire process, clone the entire repo to your home directory (i.e. adiyer/home1/
) by running:
cd "/home1/$USER"
git clone -b main https://github.com/abhay-iy97/DeepSports.git
- Note: When it asks for your GitHub password, you may have to setup a personal access token for GitHub and use that instead (click here for quick guide).
In this step, we download the videos by running the following command. Note that it is not part of a job script because job's are not able to CURL from outside sources.
sh "/home1/$USER/DeepSports/data_preparation/download_videos.sh"
After that is done, to extract frames of the MTL-AQA videos, run:
sbatch "/home1/$USER/DeepSports/job_files/prepare_dataset.job"
To check on the job's status, you can run:
squeue --me
TimeSformer models pretrained on Kinetics-400 (K400), Kinetics-600 (K600), Something-Something-V2 (SSv2), and HowTo100M datasets are shown in the table below. First, copy the download URL of the model you wish to use below, then modify the command below to use the link you copied.
- Note: Modify the dropbox link you copied to end with dl=1 not dl=0.
cd "/home1/$USER/DeepSports/training/TimeSformer"
curl https://www.dropbox.com/s/g5t24we9gl5yk88/TimeSformer_divST_8x32_224_K400.pyth?dl=1 -L -o TimeSformer_divST_8x32_224_K400.pyth
name | dataset | # of frames | spatial crop | acc@1 | acc@5 | url |
---|---|---|---|---|---|---|
TimeSformer | K400 | 8 | 224 | 77.9 | 93.2 | model |
TimeSformer-HR | K400 | 16 | 448 | 79.6 | 94.0 | model |
TimeSformer-L | K400 | 96 | 224 | 80.6 | 94.7 | model |
name | dataset | # of frames | spatial crop | acc@1 | acc@5 | url |
---|---|---|---|---|---|---|
TimeSformer | K600 | 8 | 224 | 79.1 | 94.4 | model |
TimeSformer-HR | K600 | 16 | 448 | 81.8 | 95.8 | model |
TimeSformer-L | K600 | 96 | 224 | 82.2 | 95.6 | model |
name | dataset | # of frames | spatial crop | acc@1 | acc@5 | url |
---|---|---|---|---|---|---|
TimeSformer | SSv2 | 8 | 224 | 59.1 | 85.6 | model |
TimeSformer-HR | SSv2 | 16 | 448 | 61.8 | 86.9 | model |
TimeSformer-L | SSv2 | 64 | 224 | 62.0 | 87.5 | model |
name | dataset | # of frames | spatial crop | single clip coverage | acc@1 | url |
---|---|---|---|---|---|---|
TimeSformer | HowTo100M | 8 | 224 | 8.5s | 56.8 | model |
TimeSformer | HowTo100M | 32 | 224 | 34.1s | 61.2 | model |
TimeSformer | HowTo100M | 64 | 448 | 68.3s | 62.2 | model |
TimeSformer | HowTo100M | 96 | 224 | 102.4s | 62.6 | model |
Run the following, then restart your SSH connection with CARC to restart the shell so that the environment changes take effect.
module load python/3.9.2
module load anaconda3/2021.05
conda create -n timesformer python=3.7 -y
conda init
After restarting, run this to install all dependencies:
module load python/3.9.2
module load anaconda3/2021.05
conda activate timesformer
pip install torch torchvision fvcore simplejson einops timm
conda install av -c conda-forge
pip install psutil scikit-learn opencv-python tensorboard matplotlib
pip install torchsort
At this point, you should be set and ready to train models!
First, modify the contents of /home1/<username>/DeepSports/model_train_config.sh
to the parameters you want, here's a full argument list:
Argument | Description | Default Value | Possible Values |
---|---|---|---|
loglevel | The level of logging in the application. | INFO |
[DEBUG, INFO, WARNING, ERROR, CRITICAL] |
gpu | Whether to use GPU for training. | True |
[False, True] |
root_dir | The path to the root directory containing the frame images. | ./ |
directory path |
train_path | The filepath to the train split pickle file. | ./train_split_0.pkl |
.pkl file path |
test_path | The filepath to the test split pickle file. | ./test_split_0.pkl |
.pkl file path |
batch_size | The batch size used in training. | 4 |
[1, inf] |
epochs | The number of epochs used in training. | 5 |
[1, inf] |
learning_rate | The learning rate used in training. | 0.00001 |
[0.0, inf] |
weight_decay | The weight decay used in training. | 0.00001 |
[0.0, inf] |
momentum | The momentum used in SGD/RMSProp optimizers for training. | 0.9 |
[0.0, inf] |
train_val_split_ratio | The ratio in which the training and validation datasets are split. | 0.8 |
[0.0, 1.0] |
frame_num | The number of frames to use in each clip. | 8 |
[1, inf] |
frame_method | The algorithm to use to sample frames from a long clip. | space_fixed |
[random, spaced_fixed, spaced_varied, spaced_fixed_new, spaced_varied_new] |
spatial_size | The image size in pixels that they will be resized to. | 224 |
[1, inf] |
freeze | Whether to freeze the gradients in the TimeSformer model. | False |
[False, True] |
dropout | The dropout value used in MLP. | [0.5, 0.5] |
list of drop prob, [0.0, 1.0] each. dropout is before the linear |
activation | The activation function used in the MLP network. | None |
[None, ReLU, LeakyReLU, ELU, GELU] |
topology | The hidden neurons topology between the TimeSformer model and the final output layer. | [512, 256] |
list of ints [1, inf] each |
output | The output filepath for the losses figure. | ./losses.png |
.png file path |
annotation_path | The path to the final annotations dict pickle file. | ./final_annotations_dict.pkl |
.pkl file path |
attention_type | The type of attention used in the transformer. | divided_space_time |
[divided_space_time, space_only, joint_space_time] |
optimizer | The optimizer used in training. | AdamW |
[Adam, AdamW, SGD, RMSProp] |
patch_size | The patch size used in the transformer. | 16 |
[1, inf] |
embed_dim | The embed dimensions output from the transformer. | 768 |
[1, inf] |
pretrained_model | The filepath to the pretrained .pyth model for the TimeSformer. If set to 'scratch', it will train a TimeSformer from scratch. | scratch |
path/to/model.pyth or scratch |
evaluate | Whether to use evaluate on testing dataset. | False |
[False, True] |
normalize | Whether to use normalize the RGB channels in the video clips as a preprocessing step. | False |
[False, True] |
data_aug | Whether to use randomly resize and crop the video clips as a preprocessing step. | False |
[False, True] |
amsgrad | Whether to use amsgrad for Adam/AdamW optimizer. | False |
[False, True] |
videos | The videos directory name to use. Default value is 'all'. To use select directories, only list their directory names, i.e. '01 02' | all |
all or list of directories (i.e. 01 02) |
loss_mse_weight | The weight given to the MSE loss. | 1 |
[-inf, inf] |
loss_spcoef_weight | The weight given to the differentiable Spearman Correlation loss. | 0 |
[-inf, inf] |
use_decoder | Whether to use a Transformer Decoder or revert to using MLP. | False |
[False, True] |
Next, you may want to modify the resources allocation of the training job (gpu, time, etc...) in /home1/<username>/DeepSports/job_files/train_model.job
and then run:
sbatch "/home1/$USER/DeepSports/job_files/train_model.job"
The following tree-view is what you should expect at the end of your setup.
.
├── /home1/
│ ├── <username>/
│ ├── DeepSports/
│ ├── data_preparation/
│ | └── ...
│ ├── job_files/
│ | └── ...
│ ├── training/
│ | └── ...
│ └── model_train_config.sh
│
├── /scratch1/
│ ├── <username>/
│ ├── DeepSports_dataset/
│ ├── whole_videos/
│ | ├── 01.mp4
│ | ├── 02.mp4
│ | ├── 03.mp4
│ | ├── 04.mp4
│ | ├── 05.mp4
│ | ├── 06.mp4
│ | ├── 07.mp4
│ | ├── 09.mp4
│ | ├── 10.mp4
│ | ├── 13.mp4
│ | ├── 14.mp4
│ | ├── 17.mp4
│ | ├── 18.mp4
│ | ├── 22.mp4
│ | └── 26.mp4
│ └── whole_videos_frames/
│ ├── 01/
│ | └── ...
│ ├── 02/
│ | └── ...
│ ├── 03/
│ | └── ...
│ ├── 04/
│ | └── ...
│ ├── 05/
│ | └── ...
│ ├── 06/
│ | └── ...
│ ├── 07/
│ | └── ...
│ ├── 09/
│ | └── ...
│ ├── 10/
│ | └── ...
│ ├── 13/
│ | └── ...
│ ├── 14/
│ | └── ...
│ ├── 17/
│ | └── ...
│ ├── 18/
│ | └── ...
│ ├── 22/
│ | └── ...
│ └── 26/
│ └── ...