Skip to content

lourdes_agapito_06_07_2018

Yana edited this page Jul 6, 2018 · 2 revisions

Lourdes Agapito - 3D pose estimation

3D main problem : even harder to get data than in the 2d case

Solutions:

  • leverage 2D annotations
  • self-supervised learning

Precursors

Early works

Eadwaerd Muybridge (XIXth century)

Took consecutive shots to check whether the 4 feet of the horse were up in the air at the same time (~video) Took pictures in multiview settings to capture people in different poses

Johansson et al. : Experiment of just putting lights on people to see whether we are capable (as humans) to recognize people's activities from keypoint positions.

Today, to do capture in a very reliable way, the way to go is to use physical markers. But these markers are not present 'in the wild'.

Reconstruction from monocular video is an ill-posed problem. To reconstruct deformable objects, the way to go is deformable models specific to this object (but this doesn't scale nicely...)

2D pose detection

  • 2D pose estimation first deep work: Deep Pose Toshev and Szegedy, CVPR 14

  • Convolutional Pose Machines, Wei et al, CVPR 2016

    • Iterative process, first estimate heatmaps for each joint
    • iterate with this information + the image information to refine these predictions

Side note : Why do people estimate keypoints and not limbs ? Maybe more difficult to annotate, probably less ambiguous.

Tasks

Capturing 3D dynamic scenes

Baselines

  • Coordinate regression (bypass 2D coordinate detection)

    • Need 3D annotations
  • Pipeline : detect 2D and lift to 3D

    • Attractive because 2D detections are very reliable
    • Unrecoverable errors (If 2D is wrong, no recovery is possible)
  • Alternatives

    • volumetric heatmaps (Pavlakos, CVPR 2017)
    • combine 2D heatmaps and image input
    • Add synthetic data (Varol et al, CVPR 2017)
    • example-based retrieval (Chen and Ramanan, CVPR 2017)

Lifting from the deep.

  • 2 sources of unpaired annotations:

    • images with 2D annotations
    • mocap data
  • Approah

    • extract pose in 2D and predict 2D from 3D
    • reproject to 2D the 3D heatmaps
    • fuse original and reprojected 2D heatmaps and put loss on 2D
  • The mocap data is used to create a probabilistic model from 2D to 3D pose lifting

    • compute mean pose
    • align to the same canonical orientation
    • learn a mixture of PPCA models by clustering clusters of poses and computing the PPCA models

Non-rigid surface reconstruction

Big optimization with lots of energy terms

Dynamic and semantic segmentation

Clone this wiki locally