Code and configuration files to reproduce object detection results of our paper. All experiments are conducted on COCO datast based on mmdetection.
Backbone | Pretrain | Lr Schd | box mAP | mask mAP | #params | FLOPs | config | model |
---|---|---|---|---|---|---|---|---|
Agent-Swin-T | ImageNet-1K | 1x | 44.6 | 40.7 | 48M | 276G | config | TsinghuaCloud |
Agent-Swin-T | ImageNet-1K | 3x | 47.3 | 42.7 | 48M | 276G | config | TsinghuaCloud |
Agent-Swin-S | ImageNet-1K | 1x | 47.2 | 42.7 | 69M | 364G | config | TsinghuaCloud |
Agent-Swin-S | ImageNet-1K | 3x | 48.9 | 43.8 | 69M | 364G | config | TsinghuaCloud |
Agent-PVT-T | ImageNet-1K | 1x | 41.4 | 38.7 | 31M | 230G | config | TsinghuaCloud |
Agent-PVT-S | ImageNet-1K | 1x | 44.5 | 41.2 | 40M | 293G | config | TsinghuaCloud |
Agent-PVT-M | ImageNet-1K | 1x | 45.9 | 42.0 | 56M | 400G | config | TsinghuaCloud |
Agent-PVT-L | ImageNet-1K | 1x | 46.9 | 42.8 | 68M | 510G | config | TsinghuaCloud |
Backbone | Pretrain | Lr Schd | box mAP | mask mAP | #params | FLOPs | config | model |
---|---|---|---|---|---|---|---|---|
Agent-Swin-T | ImageNet-1K | 1x | 49.2 | 42.7 | 86M | 755G | config | TsinghuaCloud |
Agent-Swin-T | ImageNet-1K | 3x | 51.4 | 44.5 | 86M | 755G | config | TsinghuaCloud |
Agent-Swin-S | ImageNet-1K | 3x | 52.6 | 45.5 | 107M | 843G | config | TsinghuaCloud |
Agent-Swin-B | ImageNet-1K | 3x | 52.6 | 45.3 | 145M | 990G | config | TsinghuaCloud |
Backbone | Pretrain | Lr Schd | box mAP | #params | FLOPs | config | model |
---|---|---|---|---|---|---|---|
Agent-PVT-T | ImageNet-1K | 1x | 40.3 | 21M | 211G | config | TsinghuaCloud |
Agent-PVT-S | ImageNet-1K | 1x | 44.1 | 30M | 274G | config | TsinghuaCloud |
Agent-PVT-M | ImageNet-1K | 1x | 45.8 | 46M | 382G | config | TsinghuaCloud |
Agent-PVT-L | ImageNet-1K | 1x | 46.8 | 58M | 492G | config | TsinghuaCloud |
Prepare COCO dataset, and change data_root
argument in configs/_base_/datasets/coco_detection.py
and configs/_base_/datasets/coco_instance.py
to the dataset path.
Please place ImageNet-1K pretrained models under ./data/
folder and rename them as {MODEL_STRUCTURE}_max_acc.pth
, e.g. agent_swin_t_max_acc.pth
.
For convenience, we provide the conda environment file and pre-bulit mmcv
.
Please download the pre-built mmcv here, and place it under ../
We use an empty mmcv
directory as a placeholder.
conda env create -f agent_detection.yaml
cd ../mmcv/
pip install -v -e .
cd ../detection/
pip install -v -e .
# single-gpu testing
python tools/test.py <CONFIG_FILE> <DET_CHECKPOINT_FILE> --eval bbox segm
# multi-gpu testing
tools/dist_test.sh <CONFIG_FILE> <DET_CHECKPOINT_FILE> <GPU_NUM> --eval bbox segm
To train a detector with pre-trained models, run:
# single-gpu training
python tools/train.py <CONFIG_FILE>
# multi-gpu training
torchrun --nproc_per_node <GPU_NUM> tools/train.py <CONFIG_FILE> --launcher="pytorch"
If you find this repo helpful, please consider citing us.
@inproceedings{han2024agent,
title={Agent attention: On the integration of softmax and linear attention},
author={Han, Dongchen and Ye, Tianzhu and Han, Yizeng and Xia, Zhuofan and Pan, Siyuan and Wan, Pengfei and Song, Shiji and Huang, Gao},
booktitle={European Conference on Computer Vision},
year={2024},
}