We provide scripts for projector pretraining and video fine-tuning of VideoGPT+. Please follow the instructions below.
You can download all the pretraining and fine-tuning datasets from HuggingFace follow the instructions below,
mkdir playground
mkdir playground/data
cd playground/data
git lfs install
git clone https://huggingface.co/datasets/MBZUAI/VideoGPT-plus_Training_Dataset
Use the script scripts/pretrain_projector_image_encoder.sh for running MLP projector pretraining with CLIP Image Encoder.
Please use the script scripts/pretrain_projector_video_encoder.sh for running MLP projector pretraining with InternVideo2 video encoder.
ALTERNATIVELY, you can download the pretrained projector weights provided by us from the HuggingFace,
git lfs install
git clone https://huggingface.co/MBZUAI/VideoGPT-plus_Phi3-mini-4k_Pretrain
Please use the script scripts/finetune_dual_encoder.sh for video instruction fine-tuning.