Code for paper:An Information Flow Perspective for Exploring Large Vision Language Models on Reasoning Tasks. The paper is under review, we will release code in Oct.
The main implementation of Our is in transformers-4.29.2/src/transformers/generation/utils.py
.
So it is convenient to use Our decoding by just installing our modified transformers
package.
conda env create -f environment.yml
conda activate redundancy
python -m pip install -e transformers-4.29.2
- Find the file at
transformers-4.29.2/src/transformers/generation/utils.py
. - Add the arguments in
transformers.generate
function here. - Add the code in
transformers.generate
function here. - Copy and paste the
opera_decoding
function here.
The following evaluation requires for MSCOCO 2014 dataset. Please download here and extract it in your data path.
Besides, it needs you to prepare the following checkpoints of 7B base models:
- Download LLaVA-1.5 merged 7B model and specify it at Line 14 of
eval_configs/llava-1.5_eval.yaml
.
@article{zhang2024redundancy,
title={From Redundancy to Relevance: Enhancing Explainability in Multimodal Large Language Models},
author={Zhang, Xiaofeng and Shen, Chen and Yuan, Xiaosong and Yan, Shaotian and Xie, Liang and Wang, Wenxiao and Gu, Chaochen and Tang, Hao and Ye, Jieping},
journal={arXiv preprint arXiv:2406.06579},
year={2024}
}