Skip to content

nyu dataset

Yana edited this page May 22, 2017 · 2 revisions

NYU dataset 2014

[nyu-dataset] Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks [PDF] [project page] [dataset] [notes]

Jonathan Tompson, Murphy Stein, Yann Lecun, Ken Perlin

read 22/05/2017

Dataset

42 DOF

Ground truth labels :

  • start with approximate pose, render depth from 3D boned mesh model
  • compare with real depth
  • particle swarm optimization with partial randomization to find best fit to objective function
  • when converged, further optimize with Nelder Mead optimization for fast local convergence
  • three sensors to fit the LBS (Linear Blended Skin) model

CNN predictor

Pre-processing

  • Segmentation using random forests
  • Contrast normalization

Structure

  • 3 resolutions of depth image
  • 2 stage CNN for each image resolution (conv, ReLU, maxpool)
  • outputs fed into 2-stage fc (fully connected) nn with high-level convolutions

Training

  • L2 loss with backprop
  • output heat-maps trained to fit to gaussians in 2D

Finally

  • fit gaussian to heat-map to infer exact location of joint (sub-pixel level)
  • obtain corresponding depth at found position
  • use model to align mesh to heat-map positions using inverse kinematics

Results

  • Hand segmentation : 4% error (nb incorrect pixel labels / total number of pixels)
  • uv error on heatmap output : 0.41px (std 0.35px) but after upsampling : 6px error on 640*480 resolution
  • qualitative for 3D positions
Clone this wiki locally