Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



79 Commits

Repository files navigation

Login CARC

Connect to CARC Discovery Node with SSH (i.e. PuTTY). Replace username with your USC username (i.e. adiyer or alalim):

  • Host Name (or IP): [email protected]
  • Port: 22
  • Note: If possible, set seconds between keepalives to 60 to ping every 60s to maintain a long-lasting connection.

Clone Repo

To simplify the entire process, clone the entire repo to your home directory (i.e. adiyer/home1/) by running:

cd "/home1/$USER"
git clone -b main
  • Note: When it asks for your GitHub password, you may have to setup a personal access token for GitHub and use that instead (click here for quick guide).

Dataset Preparation

In this step, we download the videos by running the following command. Note that it is not part of a job script because job's are not able to CURL from outside sources.

sh "/home1/$USER/DeepSports/data_preparation/"

After that is done, to extract frames of the MTL-AQA videos, run:

sbatch "/home1/$USER/DeepSports/job_files/prepare_dataset.job"

To check on the job's status, you can run:

squeue --me

Downloading a Pre-Trained Model

TimeSformer models pretrained on Kinetics-400 (K400), Kinetics-600 (K600), Something-Something-V2 (SSv2), and HowTo100M datasets are shown in the table below. First, copy the download URL of the model you wish to use below, then modify the command below to use the link you copied.

  • Note: Modify the dropbox link you copied to end with dl=1 not dl=0.
cd "/home1/$USER/DeepSports/training/TimeSformer"
curl -L -o TimeSformer_divST_8x32_224_K400.pyth
name dataset # of frames spatial crop acc@1 acc@5 url
TimeSformer K400 8 224 77.9 93.2 model
TimeSformer-HR K400 16 448 79.6 94.0 model
TimeSformer-L K400 96 224 80.6 94.7 model
name dataset # of frames spatial crop acc@1 acc@5 url
TimeSformer K600 8 224 79.1 94.4 model
TimeSformer-HR K600 16 448 81.8 95.8 model
TimeSformer-L K600 96 224 82.2 95.6 model
name dataset # of frames spatial crop acc@1 acc@5 url
TimeSformer SSv2 8 224 59.1 85.6 model
TimeSformer-HR SSv2 16 448 61.8 86.9 model
TimeSformer-L SSv2 64 224 62.0 87.5 model
name dataset # of frames spatial crop single clip coverage acc@1 url
TimeSformer HowTo100M 8 224 8.5s 56.8 model
TimeSformer HowTo100M 32 224 34.1s 61.2 model
TimeSformer HowTo100M 64 448 68.3s 62.2 model
TimeSformer HowTo100M 96 224 102.4s 62.6 model

Setting Up Environment for TimeSformer

Run the following, then restart your SSH connection with CARC to restart the shell so that the environment changes take effect.

module load python/3.9.2
module load anaconda3/2021.05
conda create -n timesformer python=3.7 -y
conda init

After restarting, run this to install all dependencies:

module load python/3.9.2
module load anaconda3/2021.05
conda activate timesformer
pip install torch torchvision fvcore simplejson einops timm
conda install av -c conda-forge
pip install psutil scikit-learn opencv-python tensorboard matplotlib
pip install torchsort

At this point, you should be set and ready to train models!

Training a Diving ViT Model

First, modify the contents of /home1/<username>/DeepSports/ to the parameters you want, here's a full argument list:

Argument Description Default Value Possible Values
loglevel The level of logging in the application. INFO [DEBUG, INFO, WARNING, ERROR, CRITICAL]
gpu Whether to use GPU for training. True [False, True]
root_dir The path to the root directory containing the frame images. ./ directory path
train_path The filepath to the train split pickle file. ./train_split_0.pkl .pkl file path
test_path The filepath to the test split pickle file. ./test_split_0.pkl .pkl file path
batch_size The batch size used in training. 4 [1, inf]
epochs The number of epochs used in training. 5 [1, inf]
learning_rate The learning rate used in training. 0.00001 [0.0, inf]
weight_decay The weight decay used in training. 0.00001 [0.0, inf]
momentum The momentum used in SGD/RMSProp optimizers for training. 0.9 [0.0, inf]
train_val_split_ratio The ratio in which the training and validation datasets are split. 0.8 [0.0, 1.0]
frame_num The number of frames to use in each clip. 8 [1, inf]
frame_method The algorithm to use to sample frames from a long clip. space_fixed [random, spaced_fixed, spaced_varied, spaced_fixed_new, spaced_varied_new]
spatial_size The image size in pixels that they will be resized to. 224 [1, inf]
freeze Whether to freeze the gradients in the TimeSformer model. False [False, True]
dropout The dropout value used in MLP. [0.5, 0.5] list of drop prob, [0.0, 1.0] each. dropout is before the linear
activation The activation function used in the MLP network. None [None, ReLU, LeakyReLU, ELU, GELU]
topology The hidden neurons topology between the TimeSformer model and the final output layer. [512, 256] list of ints [1, inf] each
output The output filepath for the losses figure. ./losses.png .png file path
annotation_path The path to the final annotations dict pickle file. ./final_annotations_dict.pkl .pkl file path
attention_type The type of attention used in the transformer. divided_space_time [divided_space_time, space_only, joint_space_time]
optimizer The optimizer used in training. AdamW [Adam, AdamW, SGD, RMSProp]
patch_size The patch size used in the transformer. 16 [1, inf]
embed_dim The embed dimensions output from the transformer. 768 [1, inf]
pretrained_model The filepath to the pretrained .pyth model for the TimeSformer. If set to 'scratch', it will train a TimeSformer from scratch. scratch path/to/model.pyth or scratch
evaluate Whether to use evaluate on testing dataset. False [False, True]
normalize Whether to use normalize the RGB channels in the video clips as a preprocessing step. False [False, True]
data_aug Whether to use randomly resize and crop the video clips as a preprocessing step. False [False, True]
amsgrad Whether to use amsgrad for Adam/AdamW optimizer. False [False, True]
videos The videos directory name to use. Default value is 'all'. To use select directories, only list their directory names, i.e. '01 02' all all or list of directories (i.e. 01 02)
loss_mse_weight The weight given to the MSE loss. 1 [-inf, inf]
loss_spcoef_weight The weight given to the differentiable Spearman Correlation loss. 0 [-inf, inf]
use_decoder Whether to use a Transformer Decoder or revert to using MLP. False [False, True]

Next, you may want to modify the resources allocation of the training job (gpu, time, etc...) in /home1/<username>/DeepSports/job_files/train_model.job and then run:

sbatch "/home1/$USER/DeepSports/job_files/train_model.job"

Final Directory View

The following tree-view is what you should expect at the end of your setup.

├── /home1/
│   ├── <username>/
│       ├── DeepSports/
│            ├── data_preparation/
│            |   └── ...
│            ├── job_files/
│            |   └── ...
│            ├── training/
│            |   └── ...
│            └──
├── /scratch1/
│   ├── <username>/
│       ├── DeepSports_dataset/
│            ├── whole_videos/
│            |   ├── 01.mp4
│            |   ├── 02.mp4
│            |   ├── 03.mp4
│            |   ├── 04.mp4
│            |   ├── 05.mp4
│            |   ├── 06.mp4
│            |   ├── 07.mp4
│            |   ├── 09.mp4
│            |   ├── 10.mp4
│            |   ├── 13.mp4
│            |   ├── 14.mp4
│            |   ├── 17.mp4
│            |   ├── 18.mp4
│            |   ├── 22.mp4
│            |   └── 26.mp4
│            └── whole_videos_frames/
│                ├── 01/
│                |   └── ...
│                ├── 02/
│                |   └── ...
│                ├── 03/
│                |   └── ...
│                ├── 04/
│                |   └── ...
│                ├── 05/
│                |   └── ...
│                ├── 06/
│                |   └── ...
│                ├── 07/
│                |   └── ...
│                ├── 09/
│                |   └── ...
│                ├── 10/
│                |   └── ...
│                ├── 13/
│                |   └── ...
│                ├── 14/
│                |   └── ...
│                ├── 17/
│                |   └── ...
│                ├── 18/
│                |   └── ...
│                ├── 22/
│                |   └── ...
│                └── 26/
│                    └── ...


Action Quality Assessment using Transformers







No releases published


No packages published
