This repo is for MiniViT for swin transformers.
Model | Params. | Input | Top-1 Acc. % | Top-5 Acc. % | Download link |
---|---|---|---|---|---|
Mini-Swin-T | 12M | 224x224 | 81.3 | 95.7 | model, log |
Mini-Swin-S | 26M | 224x224 | 83.9 | 97.0 | model, log |
Mini-Swin-B | 46M | 224x224 | 84.5 | 97.3 | model, log |
Mini-Swin-B | 47M | 384x384 | 85.5 | 97.6 | model, log |
Create the environment:
pip install -r requirements.txt
You can download the ImageNet-1K dataset from http://www.image-net.org/
.
The train set and validation set should be saved as the *.tar
archives:
ImageNet/
├── train.tar
└── val.tar
Our code also supports storing images as individual files as follow:
ImageNet/
├── train
│ ├── n01440764
│ │ ├── n01440764_10026.JPEG
│ │ ├── n01440764_10027.JPEG
...
├── val
│ ├── n01440764
│ │ ├── ILSVRC2012_val_00000293.JPEG
Training Mini-Swin-Tiny
python -m torch.distributed.launch --nproc_per_node 8 main.py --cfg configs/swin_tiny_patch4_window7_224_minivit_sharenum6.yaml --data-path <data-path> --output <output-folder> --tag mini-swin-tiny --batch-size 128 --is_sep_layernorm --is_transform_heads --is_transform_ffn --do_distill --alpha 0.0 --teacher <teacher-path> --attn_loss --hidden_loss --hidden_relation --student_layer_list 11_9_7_5_3_1 --teacher_layer_list 23_21_15_9_3_1 --hidden_weight 0.1
Training Mini-Swin-Small
python -m torch.distributed.launch --nproc_per_node 8 main.py --cfg configs/swin_small_patch4_window7_224_minivit_sharenum2.yaml --data-path --output --tag mini-swin-small --batch-size 128 --is_sep_layernorm --is_transform_heads --is_transform_ffn --do_distill --alpha 0.0 --teacher --attn_loss --hidden_loss --hidden_relation --student_layer_list 23_21_15_9_3_1 --teacher_layer_list 23_21_15_9_3_1 --hidden_weight 0.1
Training Mini-Swin-Base
python -m torch.distributed.launch --nproc_per_node 8 main.py --cfg configs/swin_base_patch4_window7_224_minivit_sharenum2.yaml --data-path --output --tag mini-swin-base --batch-size 128 --is_sep_layernorm --is_transform_heads --is_transform_ffn --do_distill --alpha 0.0 --teacher --attn_loss --hidden_loss --hidden_relation --student_layer_list 23_21_15_9_3_1 --teacher_layer_list 23_21_15_9_3_1 --hidden_weight 0.1
python -m torch.distributed.launch --nproc_per_node 8 main.py --cfg configs/swin_base_patch4_window7_224to384_minivit_sharenum2_adamw.yaml --data-path <data-path> --output <output-folder> --tag mini-swin-base-224to384 --batch-size 16 --accumulation-steps 2 --is_sep_layernorm --is_transform_heads --is_transform_ffn --resume <model-224-ckpt> --resume_weight_only --train_224to384
Run the following commands for evaluation:
Evaluate Mini-Swin-Tiny
python -m torch.distributed.launch --nproc_per_node 8 main.py --cfg configs/swin_tiny_patch4_window7_224_minivit_sharenum6.yaml --data-path <data-path> --batch-size 64 --is_sep_layernorm --is_transform_ffn --is_transform_heads --resume checkpoints/mini-swin-tiny-12m.pth --eval
Evaluate Mini-Swin-Small
python -m torch.distributed.launch --nproc_per_node 8 main.py --cfg configs/swin_small_patch4_window7_224_minivit_sharenum2.yaml --data-path --batch-size 64 --is_sep_layernorm --is_transform_ffn --is_transform_heads --resume checkpoints/mini-swin-small-26m.pth --eval
Evaluate Mini-Swin-Base
python -m torch.distributed.launch --nproc_per_node 8 main.py --cfg configs/swin_base_patch4_window7_224_minivit_sharenum2.yaml --data-path --batch-size 64 --is_sep_layernorm --is_transform_ffn --is_transform_heads --resume checkpoints/mini-swin-base-46m.pth --eval
Evaluate Mini-Swin-Base-384
python -m torch.distributed.launch --nproc_per_node 8 main.py --cfg configs/swin_base_patch4_window7_224to384_minivit_sharenum2_adamw.yaml --data-path --batch-size 32 --is_sep_layernorm --is_transform_ffn --is_transform_heads --resume checkpoints/mini-swin-base-224to384.pth --eval
If this repo is helpful for you, please consider to cite it. Thank you! :)
@InProceedings{MiniViT,
title = {MiniViT: Compressing Vision Transformers With Weight Multiplexing},
author = {Zhang, Jinnian and Peng, Houwen and Wu, Kan and Liu, Mengchen and Xiao, Bin and Fu, Jianlong and Yuan, Lu},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {12145-12154}
}
Our code is based on Swin Transformer. Thank you!