The project aims to develop a model that learns different queries for each vehicle in bird's eye view (BEV) from multi-camera images, enabling compact and interpretable vehicle representations for downstream tasks like 3D object detection, tracking, and motion forecasting.
Different queries successfully learn to identify vehicles in the scene.
git clone https://github.com/mrabiabrn/mask2former4bev.git
cd mask2former4bev
Create a Conda environment and install the required dependencies:
conda create -n mask2former4bev
conda activate mask2former4bev
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt
Download NuScenes from this link to root/to/nuscenes
.
torchrun --master_port 2245 --nproc_per_node=<gpus> train.py --dataset_path "root/to/dataset"
This repository incorporates code from several public works, including SimpleBEV, Mask2Former, and SOLV. Special thanks to the authors of these projects for making their code available.