Skip to content

[AAAI 2025] Code for paper:Enhancing Multimodal Large Language Models Complex Reasoning via Similarity Computation

Notifications You must be signed in to change notification settings

FanshuoZeng/Simignore

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Enhancing Multimodal Large Language Models Complex Reasoning via Similarity Computation


News

  • 2024.12.10 The paper has been accepted in AAAI 2025

Setup

conda create -n simignore python=3.10
conda activate simignore
cd src
bash setup.sh

Checkpoint

You can download LLaVA1.5-7b from Hugging Face and save it under checkpoint/llava.

ScienceQA dataset

You can download ScienceQA from Google Drive and unzip the images under data/scienceqa/images.

Simignore Zero-shot Inference

We provide the Zero-shot inference procedure for the LLaVA1.5-7b model and the LLaVA1.5-13b model on the ScienceQA(Image) dataset. We conduct the following experiments on one 4090D GPU (24G).

Inference using LLaVA-v1.5-7b model.

bash ./src/Simignore/inference/eval/eval_sqa_latency_inplace.sh

Inference using LLaVA-v1.5-13b model.

bash ./src/Simignore/inference/eval/eval_sqa_latency_inplace_13b.sh 

Simignore Evaluatio

Evaluation using LLaVA-v1.5-7b model.

bash ./src/Simignore/inference/eval/generate_sqa_results.sh

Evaluation using LLaVA-v1.5-13b model.

bash ./src/Simignore/inference/eval/generate_sqa_results_13b.sh 

Main result


Citation

@article{zhang2024enhancing,
  title={Enhancing Multimodal Large Language Models Complex Reason via Similarity Computation},
  author={Zhang, Xiaofeng and Zeng, Fanshuo and Quan, Yihao and Hui, Zheng and Yao, Jiawei},
  journal={The 39th Annual AAAI Conference on Artificial Intelligence},
  year={2024}
}

@article{zhang2024simignore,
  title={Simignore: Exploring and enhancing multimodal large model complex reasoning via similarity computation},
  author={Zhang, Xiaofeng and Zeng, Fanshuo and Gu, Chaochen},
  journal={Neural Networks},
  pages={107059},
  year={2024},
  publisher={Elsevier}
}

About

[AAAI 2025] Code for paper:Enhancing Multimodal Large Language Models Complex Reasoning via Similarity Computation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published