-
Notifications
You must be signed in to change notification settings - Fork 7
1705.01389
[arxiv 1705.01389] Learning to Estimate 3D Hand Pose from Single RGB Images [PDF] [synthetic dataset] [notes]
Christian Zimmermann, Thomas Brox
read 04/05/2017
3 Networks are used sequentially :
-
hand localization through segmentation
-
21 2D keypoint localization in hand
-
deduction of 3D hand pose from 2D keypoints
Synthesized dataset (freely available humans mixamo + Blender) : 41258 training 2728 testing images in resolution 320x320 21 keypoints 33 segmentation masks
Interesting analysys of prediction for the 2d to 3d network with more or less data keypoints that shows what the network predicts given more or less data
No existing dataset for 3d hand poses with enough variability ==> synthetic one created for this article
NYU not good dataset as only registered images provided
Evaluates on Dexter
This paper separates the viewpoint and the estimate of the keypoint positions in the canonical base, this implies that viewpoint is not used to estimate the coordinates.
Which one is more robust ? If viewpoint, use this knowledgee to estimate coordinates ?
But both are minimized jointly (as an unweighted sum)
Also, synthesizes not manipulation actions, therefore few occlusion examples, and therefore almost always fails in this case