1801.01615

CVPR 2018

[arxiv 1712.09184] Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies [PDF] [notes]

Hanbyul Joo, Tomas Simon, Yaser Sheikh

read 11/04/2018

Objective

In multiview RGB setting, obtain a complete mesh reconstruction with granularity relevant to size of parts (large for hands and faces for instance). Create a low-dimensional parameterized full body model that is jointly parameterized for body face and hand shapes.

Synthesis

Keypoint detectors

Use openpose for body hands and faces to obtain 2D detections, 3D skeletons are then obtained using known camera calibration info. Some keypoints can be missing in case of challenging occlusions or motion blur Add a detector of foot tip of big and little toe

Obtain point clouds + normals using commercial Capture Reality from multiview images

Optimize to produce mesh model Frankensteing

keypoints are matched in 3D to correspondances in mesh model, a correspondence matrix determines mesh joints from vertices (mesh joints are a linear combination of vertice as far as I understand with weights shared accross coordinates), Euclidian distance is then used to produce an energy term
Iterative Closest Point (ICP) to put cloud measurements and model mesh vertices in correspondance, obtained at each solver's iteration, they threshold distances during the search
They use normal info by computing point to plane distances between matched cloud points and mesh surface (e.g. distance to model mesh plane along normal direction of the cloud point)
Penalize differences of seams at discontinuities (faces, hands)
Prior are set for shape and pose, for which they set 0-mean standard normal priors for each parameter

Optimization tricks

Initialize optimization of strong measurement cues and priors, then add additional measurements and relax prior

First align torso (shoulders and hips)
then align add keypoints, this provides mocap results without shape info
then add point cloud info which allows to also captire shape
They regress smpl 3D joint locations to their annotations 3D keypoint locations by finding a sparse linear combination of vertices that approximates the mapping function from one to the other