Skip to content

Latest commit

 

History

History
94 lines (69 loc) · 5.56 KB

README.md

File metadata and controls

94 lines (69 loc) · 5.56 KB

Agent Attention for Object Detection

Code and configuration files to reproduce object detection results of our paper. All experiments are conducted on COCO datast based on mmdetection.

Results and Models

Mask R-CNN

Backbone Pretrain Lr Schd box mAP mask mAP #params FLOPs config model
Agent-Swin-T ImageNet-1K 1x 44.6 40.7 48M 276G config TsinghuaCloud
Agent-Swin-T ImageNet-1K 3x 47.3 42.7 48M 276G config TsinghuaCloud
Agent-Swin-S ImageNet-1K 1x 47.2 42.7 69M 364G config TsinghuaCloud
Agent-Swin-S ImageNet-1K 3x 48.9 43.8 69M 364G config TsinghuaCloud
Agent-PVT-T ImageNet-1K 1x 41.4 38.7 31M 230G config TsinghuaCloud
Agent-PVT-S ImageNet-1K 1x 44.5 41.2 40M 293G config TsinghuaCloud
Agent-PVT-M ImageNet-1K 1x 45.9 42.0 56M 400G config TsinghuaCloud
Agent-PVT-L ImageNet-1K 1x 46.9 42.8 68M 510G config TsinghuaCloud

Cascade Mask R-CNN

Backbone Pretrain Lr Schd box mAP mask mAP #params FLOPs config model
Agent-Swin-T ImageNet-1K 1x 49.2 42.7 86M 755G config TsinghuaCloud
Agent-Swin-T ImageNet-1K 3x 51.4 44.5 86M 755G config TsinghuaCloud
Agent-Swin-S ImageNet-1K 3x 52.6 45.5 107M 843G config TsinghuaCloud
Agent-Swin-B ImageNet-1K 3x 52.6 45.3 145M 990G config TsinghuaCloud

RetinaNet

Backbone Pretrain Lr Schd box mAP #params FLOPs config model
Agent-PVT-T ImageNet-1K 1x 40.3 21M 211G config TsinghuaCloud
Agent-PVT-S ImageNet-1K 1x 44.1 30M 274G config TsinghuaCloud
Agent-PVT-M ImageNet-1K 1x 45.8 46M 382G config TsinghuaCloud
Agent-PVT-L ImageNet-1K 1x 46.8 58M 492G config TsinghuaCloud

Usage

Dataset

Prepare COCO dataset, and change data_root argument in configs/_base_/datasets/coco_detection.py and configs/_base_/datasets/coco_instance.py to the dataset path.

ImageNet-1K Pretrained Model

Please place ImageNet-1K pretrained models under ./data/ folder and rename them as {MODEL_STRUCTURE}_max_acc.pth, e.g. agent_swin_t_max_acc.pth.

Installation

For convenience, we provide the conda environment file and pre-bulit mmcv. Please download the pre-built mmcv here, and place it under ../ We use an empty mmcv directory as a placeholder.

conda env create -f agent_detection.yaml
cd ../mmcv/
pip install -v -e .
cd ../detection/
pip install -v -e .

Inference

# single-gpu testing
python tools/test.py <CONFIG_FILE> <DET_CHECKPOINT_FILE> --eval bbox segm

# multi-gpu testing
tools/dist_test.sh <CONFIG_FILE> <DET_CHECKPOINT_FILE> <GPU_NUM> --eval bbox segm

Training

To train a detector with pre-trained models, run:

# single-gpu training
python tools/train.py <CONFIG_FILE>

# multi-gpu training
torchrun --nproc_per_node <GPU_NUM> tools/train.py <CONFIG_FILE> --launcher="pytorch"

Citation

If you find this repo helpful, please consider citing us.

@inproceedings{han2024agent,
  title={Agent attention: On the integration of softmax and linear attention},
  author={Han, Dongchen and Ye, Tianzhu and Han, Yizeng and Xia, Zhuofan and Pan, Siyuan and Wan, Pengfei and Song, Shiji and Huang, Gao},
  booktitle={European Conference on Computer Vision},
  year={2024},
}