Multimodal Alignment

Large Multimodal Models (LMMs) are prone to hallucinations, where their outputs are not grounded in the provided multimodal context, leading to unreliable or incorrect answers. This project aimed to reduce hallucinations in vision-language models by implementing two recent techniques: Fact-RLHF and feedback-guided self-revision. Following Fact-RLHF, we fine-tuned the vision-language model BLIP2 using supervised instruction-tuning and DPO training. Additionally, we enhanced the revision process in feedback-guided self-revision by incorporating factual information, inspired by Fact-RLHF. The effectiveness of our methods was demonstrated through evaluations on MMHal-Bench and POPE benchmarks.

Methods

DPO training: Generated BLIP2 response pairs with different temperatures, created preference with Qwen2-VL and used DPOTrainer from TRL library.
SFT training: Instruction-tuned BLIP2 for 1 epoch with samples from LLaVa-Instruct dataset using SFTTrainer from TRL.
Self-revision: Performed iterative refinement of BLIP2 and LLaVa Onevision inference upto 3 iterations with previous responses and factual information.

Results

Evaluation performed on POPE and MMHal-Bench.

See report.

Contributors

Sadman Sakib
Danyal Maqbool
Rishika Ahuja
Muhammad Musa
Apoorva Mittal

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
FM_Reward		FM_Reward
MMHal-Bench		MMHal-Bench
POPE		POPE
Volcano		Volcano
code		code
data		data
logs		logs
paste		paste
results		results
scripts		scripts
sm		sm
.gitignore		.gitignore
README.md		README.md
Report.pdf		Report.pdf
find_hallucination.py		find_hallucination.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Alignment

Methods

Results

Contributors

About

Releases

Packages

Languages

sadmankiba/Multimodal-Alignment

Folders and files

Latest commit

History

Repository files navigation

Multimodal Alignment

Methods

Results

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages