Skip to content

Latest commit

 

History

History
152 lines (110 loc) · 6.3 KB

README.md

File metadata and controls

152 lines (110 loc) · 6.3 KB

Uni-AdaFocus (TPAMI'24 & ICCV'21/CVPR'22/ECCV'22)

This repo contains the official code and pre-trained models for "Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition".

Uni-AdaFocus is the latest version of the AdaFocus series.

Contents

Introduction

We explore the phenomenon of spatial redundancy, temporal redundancy and sample-wise redundancy in video understanding and propose Uni-AdaFocus, an efficient end-to-end video recognition framework. Uni-AdaFocus is built on top of AdaFocus, which employs a lightweight encoder and policy network to identify and process the most informative spatial regions in each video frame. Uni-AdaFocus extends AdaFocus by dynamically allocating computation to the most task-relevant frames and minimizing the computational resources spent on easier videos. Uni-AdaFocus is compatible with off-the-shelf efficient backbones (e.g. TSM and X3D), and can markedly improve their inference efficiency. Extensive experiments on seven benchmark datasets (i.e, ActivityNet, FCVID, Mini-Kinetics, Something-Something V1&V2, Jester, and Kinetics-400) and three real-world application scenarios (i.e, fine-grained diving action classification, Alzheimer's and Parkinson's diseases diagnosis with brain magnetic resonance images (MRI), and violence recognition for online videos) substantiate that Uni-AdaFocus is considerably more efficient than the competitive baselines.

Get Started

Setup environment:

conda create -n adafocus python=3.9
conda activate adafocus
conda install pytorch=1.12.1 torchvision=0.13.1 -c pytorch
pip install numpy==1.26.0 tensorboardX
# if you are trying Uni-AdaFocus-X3D, run the following line
pip install iopath simplejson fvcore pytorchvideo psutil matplotlib opencv-python scipy pandas

For reproducing our experimental results, please go to following folders for specific instructions:

For applying Uni-AdaFocus to your own tasks, check this tutorial:

Results

  • Performance (mAP, GFLOPs) on ActivityNet, FCVID and Mini-Kinetics

  • Inference efficiency (mAP v.s. GFLOPs) on ActivityNet

  • Inference efficiency (mAP v.s. GFLOPs) of Uni-AdaFocus and the preliminary versions of AdaFocus on ActivityNet

  • Performance (Acc., GFLOPs) on Something-Something V1&V2 and Jester

  • Inference efficiency (Acc. v.s. GFLOPs) of Uni-AdaFocus and the preliminary versions of AdaFocus on Something-Something V1&V2

  • Performance (Acc., GFLOPs) on Kinetics-400

  • Visualization of the selected informative frames and patches

  • Visualization of failure cases

Reference

If you find our code or papers useful for your research, please cite:

@article{wang2024uniadafocus,
     title = {Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition},
    author = {Wang, Yulin and Zhang, Haoji and Yue, Yang and Song, Shiji and Deng, Chao and Feng, Junlan and Huang, Gao},
   journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
      year = {2024},
}

@inproceedings{wang2021adafocus,
     title = {Adaptive Focus for Efficient Video Recognition},
    author = {Wang, Yulin and Chen, Zhaoxi and Jiang, Haojun and Song, Shiji and Han, Yizeng and Huang, Gao},
 booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
      year = {2021}
}

@inproceedings{wang2022adafocusv2,
     title = {AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition},
    author = {Wang, Yulin and Yue, Yang and Lin, Yuanze and Jiang, Haojun and Lai, Zihang and Kulikov, Victor and Orlov, Nikita and Shi, Humphrey and Huang, Gao},
 booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
      year = {2022}
}

@inproceedings{wang2022adafocusv3,
     title = {AdaFocusV3: On Unified Spatial-temporal Dynamic Video Recognition},
    author = {Wang, Yulin and Yue, Yang and Xu, Xinhong and Hassani, Ali and Kulikov, Victor and Orlov, Nikita and Song, Shiji and Shi, Humphrey and Huang, Gao},
 booktitle = {European Conference on Computer Vision (ECCV)},
      year = {2022},
}

Contact

If you have any question, feel free to contact the authors or raise an issue.

Yulin Wang: [email protected]

Haoji Zhang: [email protected]