We provide the config files for MVPose (Temporal tracking and filtering): Fast and robust multi-person 3d pose estimation and tracking from multiple views.
@article{dong2021fast,
title={Fast and robust multi-person 3d pose estimation and tracking from multiple views},
author={Dong, Junting and Fang, Qi and Jiang, Wen and Yang, Yurou and Huang, Qixing and Bao, Hujun and Zhou, Xiaowei},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2021},
publisher={IEEE}
}
- Prepare models:
sh scripts/download_weight.sh
You could find perception models in weight
file.
- Download body model
Please refer to Body Model Preparation.
- Prepare the datasets:
You could download Shelf, Campus or CMU-Panoptic datasets, and convert original dataset to our unified meta-data. Considering that it takes long to run a converter, we have done it for you. Please download compressed zip file for converted meta-data from here, and place meta-data under ROOT/xrmocap_data/DATASET
.
The final file structure would be like:
xrmocap
├── xrmocap
├── docs
├── tools
├── configs
├── weight
| ├── mvpose
| | └── resnet50_reid_camstyle-98d61e41_20220921.pth
| ├── ...
| └── tracktor_reid_r50_iter25245-a452f51f.pth
└── xrmocap_data
├── body_models
| ├── gmm_08.pkl
| ├── smpl_mean_params.npz
| └── smpl
| ├── SMPL_FEMALE.pkl
| ├── SMPL_MALE.pkl
| └── SMPL_NEUTRAL.pkl
|
├── CampusSeq1
├── Shelf
| ├── Camera0
| ├── ...
| ├── Camera4
| ├── xrmocap_meta_testset_fasterrcnn
| └── xrmocap_meta_testset
└── Panoptic
├── xrmocap_meta_ian5
| ├── hd_00_03
| ├── ...
| ├── hd_00_23
| ├── camera_parameters
| ├── keypoints3d_GT.npz
| └── perception_2d.npz
├── xrmocap_meta_pizza1
├── xrmocap_meta_band4
└── xrmocap_meta_haggling1
You can download just one dataset of Shelf, Campus and CMU-Panoptic.
We evaluate MVPose (Temporal tracking and filtering) on 3 popular benchmarks, report the Percentage of Correct Parts (PCP), Mean Per Joint Position Error (MPJPE), MPJPE with Procrustes Analysis (PA) as PA-MPJPE and Probability of Correct Keypoint (PCK) on Campus, Shelf and CMU Panoptic dataset.
To be more fair in evaluation, some modifications are made compared to the evaluations in the original work. For PCP, instead of by body parts, we evaluate by the limbs defined in selected_limbs_names
and additional_limbs_names
. We remove the root alignment in MPJPE and provide PA-MPJPE instead. Thresholds for outliers are removed as well.
You can find the recommended configs in configs/mvpose_tracking/*/eval_keypoints3d.py
, where interval
is the global matching interval, that is, the maximum number of frames for Kalman filtering. If the interval is set too large, the accuracy of the estimation will be degraded, so we recommen within 50 frames. __bbox_thr__
is the threshold of bbox2d, you can set a high threshold to ignore incorrect 2D perception data, and we recommen setting it to 0.8~0.9. best_distance
is the threshold at which the current-frame keypoints2d successfully matches the last-frame keypoints2d, for the different dataset, it needs to be adjusted. n_cam_min
is the amount of views required for triangulation, which defaults to 2.
The 2D perception data we use is generated by fasterrcnn, and you can download it from here. What's more, we set __bbox_thr__=0.9
, n_cam_min=2
and interval=10
.
Config | PCP | MPJPE(mm) | PA-MPJPE(mm) | PCK@50 | PCK@100 | Download |
---|---|---|---|---|---|---|
eval_keypoints3d.py | 93.20 | 84.49 | 62.68 | 42.67 | 86.92 | log |
The PCP for each actor is as follows:
Actor 0 | Actor 1 | Actor 2 | Average |
---|---|---|---|
94.21 | 87.17 | 98.20 | 93.20 |
The 2D perception data we use is generated by fasterrcnn, and you can download it from here. What's more, we set __bbox_thr__=0.9
, n_cam_min=3
and interval=5
.
Config | PCP | MPJPE(mm) | PA-MPJPE(mm) | PCK@50 | PCK@100 | Download |
---|---|---|---|---|---|---|
eval_keypoints3d.py | 96.86 | 54.20 | 43.15 | 68.85 | 97.99 | log |
The PCP for each actor is as follows:
Actor 0 | Actor 1 | Actor 2 | Average |
---|---|---|---|
98.09 | 94.89 | 97.58 | 96.86 |
The 2D perception data we use is generated by mmpose, and you can download it from here. The selection principle of the camera is to cover as much information as possible about the human body, so we selected cameras 3, 6, 12, 13 and 23.
The CMU Panoptic dataset contains four sequences that share the same config file. For different sequences, you need to change the __meta_path__
. What's more, we set __bbox_thr__=0.85
, n_cam_min=2
and interval=10
.
- 160906_band4
Config | PCP | MPJPE(mm) | PA-MPJPE(mm) | PCK@50 | PCK@100 | Download |
---|---|---|---|---|---|---|
eval_keypoints3d.py | 97.19 | 46.39 | 42.81 | 74.46 | 94.80 | log |
- 160906_ian5
Config | PCP | MPJPE(mm) | PA-MPJPE(mm) | PCK@50 | PCK@100 | Download |
---|---|---|---|---|---|---|
eval_keypoints3d.py | 83.84 | 101.48 | 83.20 | 77.65 | 88.04 | log |
- 160906_pizza1
Config | PCP | MPJPE(mm) | PA-MPJPE(mm) | PCK@50 | PCK@100 | Download |
---|---|---|---|---|---|---|
eval_keypoints3d.py | 93.80 | 58.30 | 43.85 | 74.71 | 93.41 | log |
- 160422_haggling1
Config | PCP | MPJPE(mm) | PA-MPJPE(mm) | PCK@50 | PCK@100 | Download |
---|---|---|---|---|---|---|
eval_keypoints3d.py | 94.88 | 64.38 | 49.59 | 81.45 | 94.47 | log |