Agent Attention for Object Detection

Code and configuration files to reproduce object detection results of our paper. All experiments are conducted on COCO datast based on mmdetection.

Results and Models

Mask R-CNN

Backbone	Pretrain	Lr Schd	box mAP	mask mAP	#params	FLOPs	config	model
Agent-Swin-T	ImageNet-1K	1x	44.6	40.7	48M	276G	config	TsinghuaCloud
Agent-Swin-T	ImageNet-1K	3x	47.3	42.7	48M	276G	config	TsinghuaCloud
Agent-Swin-S	ImageNet-1K	1x	47.2	42.7	69M	364G	config	TsinghuaCloud
Agent-Swin-S	ImageNet-1K	3x	48.9	43.8	69M	364G	config	TsinghuaCloud
Agent-PVT-T	ImageNet-1K	1x	41.4	38.7	31M	230G	config	TsinghuaCloud
Agent-PVT-S	ImageNet-1K	1x	44.5	41.2	40M	293G	config	TsinghuaCloud
Agent-PVT-M	ImageNet-1K	1x	45.9	42.0	56M	400G	config	TsinghuaCloud
Agent-PVT-L	ImageNet-1K	1x	46.9	42.8	68M	510G	config	TsinghuaCloud

Cascade Mask R-CNN

Backbone	Pretrain	Lr Schd	box mAP	mask mAP	#params	FLOPs	config	model
Agent-Swin-T	ImageNet-1K	1x	49.2	42.7	86M	755G	config	TsinghuaCloud
Agent-Swin-T	ImageNet-1K	3x	51.4	44.5	86M	755G	config	TsinghuaCloud
Agent-Swin-S	ImageNet-1K	3x	52.6	45.5	107M	843G	config	TsinghuaCloud
Agent-Swin-B	ImageNet-1K	3x	52.6	45.3	145M	990G	config	TsinghuaCloud

RetinaNet

Backbone	Pretrain	Lr Schd	box mAP	#params	FLOPs	config	model
Agent-PVT-T	ImageNet-1K	1x	40.3	21M	211G	config	TsinghuaCloud
Agent-PVT-S	ImageNet-1K	1x	44.1	30M	274G	config	TsinghuaCloud
Agent-PVT-M	ImageNet-1K	1x	45.8	46M	382G	config	TsinghuaCloud
Agent-PVT-L	ImageNet-1K	1x	46.8	58M	492G	config	TsinghuaCloud

Usage

Dataset

Prepare COCO dataset, and change data_root argument in configs/_base_/datasets/coco_detection.py and configs/_base_/datasets/coco_instance.py to the dataset path.

ImageNet-1K Pretrained Model

Please place ImageNet-1K pretrained models under ./data/ folder and rename them as {MODEL_STRUCTURE}_max_acc.pth, e.g. agent_swin_t_max_acc.pth.

Installation

For convenience, we provide the conda environment file and pre-bulit mmcv. Please download the pre-built mmcv here, and place it under ../ We use an empty mmcv directory as a placeholder.

conda env create -f agent_detection.yaml
cd ../mmcv/
pip install -v -e .
cd ../detection/
pip install -v -e .

Inference

# single-gpu testing
python tools/test.py <CONFIG_FILE> <DET_CHECKPOINT_FILE> --eval bbox segm

# multi-gpu testing
tools/dist_test.sh <CONFIG_FILE> <DET_CHECKPOINT_FILE> <GPU_NUM> --eval bbox segm

Training

To train a detector with pre-trained models, run:

# single-gpu training
python tools/train.py <CONFIG_FILE>

# multi-gpu training
torchrun --nproc_per_node <GPU_NUM> tools/train.py <CONFIG_FILE> --launcher="pytorch"

Citation

If you find this repo helpful, please consider citing us.

@inproceedings{han2024agent,
  title={Agent attention: On the integration of softmax and linear attention},
  author={Han, Dongchen and Ye, Tianzhu and Han, Yizeng and Xia, Zhuofan and Pan, Siyuan and Wan, Pengfei and Song, Shiji and Huang, Gao},
  booktitle={European Conference on Computer Vision},
  year={2024},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Agent Attention for Object Detection

Results and Models

Mask R-CNN

Cascade Mask R-CNN

RetinaNet

Usage

Dataset

ImageNet-1K Pretrained Model

Installation

Inference

Training

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Agent Attention for Object Detection

Results and Models

Mask R-CNN

Cascade Mask R-CNN

RetinaNet

Usage

Dataset

ImageNet-1K Pretrained Model

Installation

Inference

Training

Citation