Skip to content
/ TED Public

Transformation-Equivariant 3D Object Detection for Autonomous Driving

License

Notifications You must be signed in to change notification settings

hailanyi/TED

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transformation-Equivariant 3D Object Detection for Autonomous Driving

This is a improved version of TED by a multiple refinement design. This code is mainly based on OpenPCDet and CasA, some codes are from PENet and SFD.

Detection Framework

The overall detection framework is shown below. (1) Transformation-equivariant Sparse Convolution (TeSpConv) backbone; (2) Transformation-equivariant Bird Eye View (TeBEV) pooling; (3) Multi-grid pooling and multi-refinement. TeSpConv applies shared weights on multiple transformed point clouds to record the transformation-equivariant voxel features. TeBEV pooling aligns and aggregates the scene-level equivariant features into lightweight representations for proposal generation. Multi-grid pooling and multi-refinement align and aggregate the instance-level invariant features for proposal refinement.

Model Zoo

We release two models, which are based on LiDAR-only and multi-modal data respectively. We denoted the two models as TED-S and TED-M respectively.

  • All models are trained with 8 V100 GPUs and are available for download.

  • The models are trained with train split (3712 samples) of KITTI dataset

  • The results are the 3D AP(R40) of Car on the val set of KITTI dataset.

  • These models are not suitable to directly report results on KITTI test set, please use slightly lower score threshold and train the models on all or 80% training data to achieve a desirable performance on KITTI test set.

Modality GPU memory of training Easy Mod. Hard download
TED-S LiDAR only ~12 GB 93.25 87.99 86.28 google / baidu(p91t) / 36M
TED-M LiDAR+RGB ~15 GB 95.62 89.24 86.77 google / baidu(nkr5) / 65M

Getting Started

conda create -n spconv2 python=3.9
conda activate spconv2
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
pip install numpy==1.19.5 protobuf==3.19.4 scikit-image==0.19.2 waymo-open-dataset-tf-2-5-0 nuscenes-devkit==1.0.5 spconv-cu111 numba scipy pyyaml easydict fire tqdm shapely matplotlib opencv-python addict pyquaternion awscli open3d pandas future pybind11 tensorboardX tensorboard Cython prefetch-generator

Dependency

Our released implementation is tested on.

  • Ubuntu 18.04
  • Python 3.6.9
  • PyTorch 1.8.1
  • Spconv 1.2.1
  • NVIDIA CUDA 11.1
  • 8x Tesla V100 GPUs

We also tested on.

  • Ubuntu 18.04
  • Python 3.9.13
  • PyTorch 1.8.1
  • Spconv 2.1.22 # pip install spconv-cu111
  • NVIDIA CUDA 11.1
  • 2x 3090 GPUs

Prepare dataset

Please download the official KITTI 3D object detection dataset and organize the downloaded files as follows (the road planes could be downloaded from [road plane], which are optional for data augmentation in the training):

TED
├── data
│   ├── kitti
│   │   │── ImageSets
│   │   │── training
│   │   │   ├──calib & velodyne & label_2 & image_2 & (optional: planes)
│   │   │── testing
│   │   │   ├──calib & velodyne & image_2
├── pcdet
├── tools

You need creat a 'velodyne_depth' dataset to run our multimodal detector: You can download our preprocessed data from google (13GB), baidu (a20o), or generate the data by yourself:

  • Install this project.
  • Download the PENet depth completion model here (500M) and put it into tools/PENet.
  • Then run the following code to generate RGB pseudo points.
cd tools/PENet
python3 main.py --detpath [your path like: ../../data/kitti/training]

After 'velodyne_depth' generation, run following command to creat dataset infos:

cd ../..
python3 -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml
python3 -m pcdet.datasets.kitti.kitti_dataset_mm create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml

Anyway, the data structure should be:

TED
├── data
│   ├── kitti
│   │   │── ImageSets
│   │   │── training
│   │   │   ├──calib & velodyne & label_2 & image_2 & (optional: planes) & velodyne_depth
│   │   │── testing
│   │   │   ├──calib & velodyne & image_2 & velodyne_depth
│   │   │── gt_database
│   │   │── gt_database_mm
│   │   │── kitti_dbinfos_train_mm.pkl
│   │   │── kitti_dbinfos_train.pkl
│   │   │── kitti_infos_test.pkl
│   │   │── kitti_infos_train.pkl
│   │   │── kitti_infos_trainval.pkl
│   │   │── kitti_infos_val.pkl
├── pcdet
├── tools

Installation

git clone https://github.com/hailanyi/TED.git
cd TED
python3 setup.py develop

Training

Single GPU train:

cd tools
python3 train.py --cfg_file ${CONFIG_FILE}

For example, if you train the TED-S model:

cd tools
python3 train.py --cfg_file cfgs/models/kitti/TED-S.yaml

Multiple GPU train:

You can modify the gpu number in the dist_train.sh and run

cd tools
sh dist_train.sh

The log infos are saved into log.txt You can run cat log.txt to view the training process.

Evaluation

cd tools
python3 test.py --cfg_file ${CONFIG_FILE} --batch_size ${BATCH_SIZE} --ckpt ${CKPT}

For example, if you test the TED-S model:

cd tools
python3 test.py --cfg_file cfgs/models/kitti/TED-S.yaml --ckpt TED-S.pth

Multiple GPU test: you need modify the gpu number in the dist_test.sh and run

sh dist_test.sh 

The log infos are saved into log-test.txt You can run cat log-test.txt to view the test results.

License

This code is released under the Apache 2.0 license.

Acknowledgement

CasA

OpenPCDet

PENet

SFD

Citation

@inproceedings{TED,
    title={Transformation-Equivariant 3D Object Detection for Autonomous Driving},
    author={Wu, Hai and Wen, Chenglu and Li, Wei and Yang, Ruigang and Wang, Cheng},
    year={2023},
    booktitle={AAAI}
    
}

About

Transformation-Equivariant 3D Object Detection for Autonomous Driving

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published