lourdes_agapito_06_07_2018

Lourdes Agapito - 3D pose estimation

3D main problem : even harder to get data than in the 2d case

Solutions:

leverage 2D annotations
self-supervised learning

Precursors

Early works

Took consecutive shots to check whether the 4 feet of the horse were up in the air at the same time (~video) Took pictures in multiview settings to capture people in different poses

Johansson et al. : Experiment of just putting lights on people to see whether we are capable (as humans) to recognize people's activities from keypoint positions.

Today, to do capture in a very reliable way, the way to go is to use physical markers. But these markers are not present 'in the wild'.

Reconstruction from monocular video is an ill-posed problem. To reconstruct deformable objects, the way to go is deformable models specific to this object (but this doesn't scale nicely...)

2D pose detection

2D pose estimation first deep work: Deep Pose Toshev and Szegedy, CVPR 14
Convolutional Pose Machines, Wei et al, CVPR 2016
- Iterative process, first estimate heatmaps for each joint
- iterate with this information + the image information to refine these predictions

Side note : Why do people estimate keypoints and not limbs ? Maybe more difficult to annotate, probably less ambiguous.

Tasks

Capturing 3D dynamic scenes

Baselines

Coordinate regression (bypass 2D coordinate detection)
- Need 3D annotations
Pipeline : detect 2D and lift to 3D
- Attractive because 2D detections are very reliable
- Unrecoverable errors (If 2D is wrong, no recovery is possible)
Alternatives
- volumetric heatmaps (Pavlakos, CVPR 2017)
- combine 2D heatmaps and image input
- Add synthetic data (Varol et al, CVPR 2017)
- example-based retrieval (Chen and Ramanan, CVPR 2017)

Lifting from the deep.

2 sources of unpaired annotations:
- images with 2D annotations
- mocap data
Approah
- extract pose in 2D and predict 2D from 3D
- reproject to 2D the 3D heatmaps
- fuse original and reprojected 2D heatmaps and put loss on 2D
The mocap data is used to create a probabilistic model from 2D to 3D pose lifting
- compute mean pose
- align to the same canonical orientation
- learn a mixture of PPCA models by clustering clusters of poses and computing the PPCA models