Skip to content

1712.06584

Yana edited this page Sep 18, 2018 · 2 revisions

CVPR 2018

[arxiv 1712.06584] End-to-end Recovery of Human Shape and Pose [PDF] [notes]

Angjoo Kanazawa, Michael J Black, David W Jacobs, Jitendra Malik

Objective

Perform 3D pose and shape estimation to recover the 3D mesh of humans Show relevance of 2D annotations + reprojection to estimate 3D information without explicit 3D supervision

Synthesis

Teaser figure

image

Architecture

  • A ResNet encoder predicts image features
  • From it a fully connected network iteratively regresses deltas of the estimated pose and shape. For this, the pose and shapes are initialized at average and iteratively refined by inputing the current estimate concatenated with the image features to the regressor
  • A discriminator is applied to estimate whether the predicted 3D pose and shape parameters come from a real distribution of human shapes and poses
  • joints are reprojected in 2D so as to take advantage of large in the wild 2D annotated datasets

Technical details

  • encoder is resnet50 pretrained on ImageNet
  • image feature vector is of size 2048
  • for pose, axis-angle representation is udes
  • iterative regressor consists of two fuly conected layers of size 1024 and one of size 85 (10 shape + 23*3 pose + 2 translation, 1 scale and 3 global roation in axis-angle parameters)
  • 3 iterative steps are used for the regressor
  • they use the weak-perspective camera model (3 rotations, 2 translations, 1 scale parameter)

Experiments

  • Removing the discriminator on pose/shape creates moster humans
  • results comparable (slightly lower then state of the art) for 3D joints estimation
Clone this wiki locally