nyu dataset

Jump to bottom

Yana edited this page May 22, 2017 · 2 revisions

NYU dataset 2014

[nyu-dataset] Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks [PDF] [project page] [dataset] [notes]

Jonathan Tompson, Murphy Stein, Yann Lecun, Ken Perlin

read 22/05/2017

Dataset

42 DOF

Ground truth labels :

start with approximate pose, render depth from 3D boned mesh model
compare with real depth
particle swarm optimization with partial randomization to find best fit to objective function
when converged, further optimize with Nelder Mead optimization for fast local convergence
three sensors to fit the LBS (Linear Blended Skin) model

CNN predictor

Pre-processing

Segmentation using random forests
Contrast normalization

Structure

3 resolutions of depth image
2 stage CNN for each image resolution (conv, ReLU, maxpool)
outputs fed into 2-stage fc (fully connected) nn with high-level convolutions

Training

L2 loss with backprop
output heat-maps trained to fit to gaussians in 2D

Finally

fit gaussian to heat-map to infer exact location of joint (sub-pixel level)
obtain corresponding depth at found position
use model to align mesh to heat-map positions using inverse kinematics

Results

Hand segmentation : 4% error (nb incorrect pixel labels / total number of pixels)
uv error on heatmap output : 0.41px (std 0.35px) but after upsampling : 6px error on 640*480 resolution
qualitative for 3D positions