-
Notifications
You must be signed in to change notification settings - Fork 7
1610.04889
[arxiv 1610.04889] Real-time Joint Tracking of a Hand Manipulating an Object from RGB-D Input [PDF] [project page] [dataset]
Srinath Sridhar, Franziska Mueller, Michael Zollhöfer, Dan Casas, Antti Oulasvirta, Christian Theobalt
Real time hand-object tracking using RGBD camera
Uses 3D articulated Gaussian mixture alignment strategy
Enforce contact point between hand and object using regularizer. Used in order to take advantage of the physics of grasps
Use multilayer random forest on hand part classifier to guide optimization : segment hand and object and classify hand parts
- classify depth pixels as hand or object and hand into hand parts based on a two-layer random forest that takes occlusion into account
- training of forest (three trees trained on random distinct subsets) done on synthetic images created from a 3d synthetic model that is fit to a real image + generate sample object positions between thumb and other finger
- viewpoint selection : 4 views (front, back, thumb, little finger), selection of view based on best match for previous frame estimation
- first layer classifies hand and arm pixels
- second layer uses hand pixels to further classify in hand parts (6 classes: fingers and palm)
- input colors and depth frames
- hand-object segmentation to remove object from depth-map based on RGB cues
- ouput : probability histogram that encodes class likelyhood (object class: 1)
Parametrization of articulated motion of uman hand : 26 DOF (20 angles and 6 DOF transformation / root joint)
input depth and scene (hand + object) are expressed as 3D Gaussian Mixture Models (GMM)
Each gaussian rigged to bone of the hand, manually 30 gaussians are attached to kinematic chain to model volumetric extent (std roughly distance to surface)
Object is fitted by predefined number of Gaussians
Add visibility factor \in [0, 1] (0 : totally occluded, 1 : fully visible), computed using an occlusion map
GMM restricted to visible surface based on solution of the previous frame
Initialization of gaussians : quadtree segmentation of depth data looking at depth variance, each leaf represents a Gaussian with
Minimize two energies, one that leverages depth observations and the other one the hand part classifications => 2 proposals
Optimized using gradient descent, initialize at previous frame.
Pose is selected between the two propositions (min for each of the energy terms) by choosing the one that achieves the lowest energy value of the weighted sum of the two energy terms
All components of the energies are detailed in the article. (starts top of page 9)
Enforces anatomical constraints, speed consistency
Enforces (top page 10) contact point objective, specific for hand-object tracking scenario. Touch constraint : fingertip closer to object then sum of their stds
Enforces occlusion handling by imposing that occluded parts move as the rest of the hand
Create dataset with fingertip positions and object pose (cuboid)
3k frames Manually annotated
Having 2 proposals (from 2 separate energy terms) allows for better recovery from errors.
Evaluation on :
- IJCV
- Tzionas
- Dexter
Gaussian Mixture Alignment : problem of finding the transformation that best aligns one GAussian mixture with another, generalization of ICP that takes into account spatial proximity between Gaussians