Skip to content
/ SWAT Public

Few-shot Recognition via Stage-Wise Retrieval-Augmented Finetuning

License

Notifications You must be signed in to change notification settings

tian1327/SWAT

Repository files navigation

Few-Shot Recognition via Stage-Wise
Retrieval-Augmented Finetuning

Tian Liu1 · Huixin Zhang1 · Shubham Parashar1 · Shu Kong2

1Texas A&M University   2University of Macau

Paper PDF Project Page

Our work adapts a pretrained Vision-Language Model (VLM) and retrieves relevant pretraining images to solve few-shot recognition problem. To mitigate the domain gap and imbalanced distribution problems of retrieved data, we propose a novel Stage-Wise retrieval-Augmented fineTuning (SWAT) method, which outperforms previous few-shot recognition methods by >6% in accuracy across nine benchmark datasets.

teaser

News

  • 2025-01-18: We provide access to our retrieved data through URLs. See RETRIEVAL.md.
  • 2024-11-24: Updated code base to include more datasets.
  • 2024-08-22: Retrieval code released, see RETRIEVAL.md.
  • 2024-07-05: SWAT finetuning code released.
  • 2024-06-28: project page launched.
  • 2024-06-17: arXiv paper released.

Usage

Prepraration

Create conda environment and install dependencies following the instructions in ENV.md.

Prepare the datasets following the instructions in DATASETS.md.

Retrieve relevant pretraining data following the instructions in RETRIEVAL.md.

Running SWAT

You can run SWAT and finetune on few-shot using the following bash scripts.

# 1. check the options in run_dataset_seed_xxx.sh, 
#    this can be used to run a batch of experiments.
# 2. run the corresponding bash script in command line
# Usage: bash scripts/run_dataset_seed_xxx.sh <dataset> [seed]

# finetune on few-shot, seed 1
bash scripts/run_dataset_seed_finetune_fewshot.sh semi-aves 1

# finetune on few-shot with CutMix, 3 seeds
bash scripts/run_dataset_seed_finetune_fewshot_cutmix.sh semi-aves

# swat
bash scripts/run_dataset_seed_SWAT.sh semi-aves 1

The results of the experiments will be saved in the result directory. The detailed logs, models, and scores etc. will be saved in the output directory.

Running other baselines

Below we provide the commands to run the zero-shot and few-shot baselines in the paper. Update the model_cfg option in the bash scripts to use different models.

Zero-shot methods:

# OpenCLIP zero-shot
bash scripts/run_dataset_zeroshot.sh semi-aves

# REAL-Prompt
bash scripts/run_dataset_REAL-Prompt.sh semi-aves

# REAL-Linear
# take the WSFT accuracy with alpha=0.5
# find the line: `Alpha:0.5, Val Acc: 48.671, Test Acc: 48.562`
bash scripts/run_dataset_REAL-Linear.sh semi-aves

Few-shot methods:

# Cross-modal Linear Probing (CMLP)
bash scripts/run_dataset_seed_CMLP.sh semi-aves 1

For CLAP, we use the provided code but replace the model from CLIP to OpenCLIP. Our implementation can be found in CLAP-tian with instructions.

Acknowledgment

This code base is developed with some references on the following projects. We sincerely thank the authors for open-sourcing their projects.

Citation

If you find our project useful, please consider citing:

@article{liu2024few,
  title={Few-Shot Recognition via Stage-Wise Retrieval-Augmented Finetuning},
  author={Liu, Tian and Zhang, Huixin and Parashar, Shubham and Kong, Shu},
  journal={arXiv preprint arXiv:2406.11148},
  year={2024}
}

@inproceedings{parashar2024neglected,
  title={The Neglected Tails in Vision-Language Models},
  author={Parashar, Shubham and Lin, Zhiqiu and Liu, Tian and Dong, Xiangjue and Li, Yanan and Ramanan, Deva and Caverlee, James and Kong, Shu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
}

About

Few-shot Recognition via Stage-Wise Retrieval-Augmented Finetuning

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published