ieee 7298625

CVPR 2015

[IEEE 7298625] Delving into egocentric actions [project page] [PdF] [notes]

Yin Li, Zhefan Ye, James M. Rehg

read 19/07/2017

Objective

Evaluate performance of using hand-crafted features for action recognition in First person view

Gives baseline on GTEA, GTEA Gaze and GTEA Gaze + of the performance of the various features (measured as action recognition accuracy) and their combinations

Synthesis

Traditionnal hand-crafted spatio-temporal features such as STIP perform poorly because of camera motion

Removing camera motion allows local features to perform well
but camera motion is an action cue

Features

Motion features

Trajectory features
Histogram of Flow
Motion boundary histogram (gradient of optical flow in horizontal and vertical directions)

Object features

Histogram of Oriented Gradien (HOG) wich encode 2D image boundaries
Local Binary Patterns (compares value of central pixel to several neighbors and encodes the difference with 1 or 0 for above or beyond, then histogram)
Histogram of LAB color (L for lightness a and b for color opponents green-red and blue-yellow)

Egocentric features

Hand feature : manipulation point (point where the person is most likely to be manipulating an image), obtained from hand segmentation
Head feature : corresponds to camera motion
Gaze direction : 2D image point on each frame

Feature Engineering

Removing camera motion : subtract camera motion from dense optical flow. This produces better motion features and selects trajectories on foreground regions that move differently from camera motion
Trajectory selection : use local descriptors in vicinity of manipulation and gaze point