Implementing Grad-Cam in Clip-ViL for VQA

Heatmaps generated are essentially attention maps highlighting the important features that the CLIP-ViL model is taking into consideration for the given question.

For the question - What is next to the bottle? This is the generated image with the heatmap - Heatmap

Future Scope -

Given a video, select the most important frame using the generate heatmaps on each frame
Implement a benchmark VQA dataset like CLEVR dataset

References

@article{shen2021much,
  title={How Much Can CLIP Benefit Vision-and-Language Tasks?},
  author={Shen, Sheng and Li, Liunian Harold and Tan, Hao and Bansal, Mohit and Rohrbach, Anna and Chang, Kai-Wei and Yao, Zhewei and Keutzer, Kurt},
  journal={arXiv preprint arXiv:2107.06383},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
bottle		bottle
143.jpg		143.jpg
5.jpg		5.jpg
README.md		README.md
auxilary.py		auxilary.py
bpe_simple_vocab_16e6.txt.gz		bpe_simple_vocab_16e6.txt.gz
clip.py		clip.py
gradcam_clip.ipynb		gradcam_clip.ipynb
model.py		model.py
sample_all.pdf		sample_all.pdf
simple_tokenizer.py		simple_tokenizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implementing Grad-Cam in Clip-ViL for VQA

Future Scope -

Related Links

References

About

Releases

Packages

Languages

pranavgupta2603/CLIP-ViL-GradCAM

Folders and files

Latest commit

History

Repository files navigation

Implementing Grad-Cam in Clip-ViL for VQA

Future Scope -

Related Links

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages