1712.06584

CVPR 2018

[arxiv 1712.06584] End-to-end Recovery of Human Shape and Pose [PDF] [notes]

Angjoo Kanazawa, Michael J Black, David W Jacobs, Jitendra Malik

Objective

Perform 3D pose and shape estimation to recover the 3D mesh of humans Show relevance of 2D annotations + reprojection to estimate 3D information without explicit 3D supervision

Synthesis

Teaser figure

Architecture

A ResNet encoder predicts image features
From it a fully connected network iteratively regresses deltas of the estimated pose and shape. For this, the pose and shapes are initialized at average and iteratively refined by inputing the current estimate concatenated with the image features to the regressor
A discriminator is applied to estimate whether the predicted 3D pose and shape parameters come from a real distribution of human shapes and poses
joints are reprojected in 2D so as to take advantage of large in the wild 2D annotated datasets

Technical details

encoder is resnet50 pretrained on ImageNet
image feature vector is of size 2048
for pose, axis-angle representation is udes
iterative regressor consists of two fuly conected layers of size 1024 and one of size 85 (10 shape + 23*3 pose + 2 translation, 1 scale and 3 global roation in axis-angle parameters)
3 iterative steps are used for the regressor
they use the weak-perspective camera model (3 rotations, 2 translations, 1 scale parameter)