-
Notifications
You must be signed in to change notification settings - Fork 7
1712.06584
Yana edited this page Sep 18, 2018
·
2 revisions
[arxiv 1712.06584] End-to-end Recovery of Human Shape and Pose [PDF] [notes]
Angjoo Kanazawa, Michael J Black, David W Jacobs, Jitendra Malik
Perform 3D pose and shape estimation to recover the 3D mesh of humans Show relevance of 2D annotations + reprojection to estimate 3D information without explicit 3D supervision
- A ResNet encoder predicts image features
- From it a fully connected network iteratively regresses deltas of the estimated pose and shape. For this, the pose and shapes are initialized at average and iteratively refined by inputing the current estimate concatenated with the image features to the regressor
- A discriminator is applied to estimate whether the predicted 3D pose and shape parameters come from a real distribution of human shapes and poses
- joints are reprojected in 2D so as to take advantage of large in the wild 2D annotated datasets
- encoder is resnet50 pretrained on ImageNet
- image feature vector is of size 2048
- for pose, axis-angle representation is udes
- iterative regressor consists of two fuly conected layers of size 1024 and one of size 85 (10 shape + 23*3 pose + 2 translation, 1 scale and 3 global roation in axis-angle parameters)
- 3 iterative steps are used for the regressor
- they use the weak-perspective camera model (3 rotations, 2 translations, 1 scale parameter)
- Removing the discriminator on pose/shape creates moster humans
- results comparable (slightly lower then state of the art) for 3D joints estimation