-
Notifications
You must be signed in to change notification settings - Fork 7
ieee 7298625
[IEEE 7298625] Delving into egocentric actions [project page] [PdF] [notes]
Yin Li, Zhefan Ye, James M. Rehg
read 19/07/2017
Evaluate performance of using hand-crafted features for action recognition in First person view
Gives baseline on GTEA, GTEA Gaze and GTEA Gaze + of the performance of the various features (measured as action recognition accuracy) and their combinations
Traditionnal hand-crafted spatio-temporal features such as STIP perform poorly because of camera motion
-
Removing camera motion allows local features to perform well
-
but camera motion is an action cue
- Trajectory features
- Histogram of Flow
- Motion boundary histogram (gradient of optical flow in horizontal and vertical directions)
- Histogram of Oriented Gradien (HOG) wich encode 2D image boundaries
- Local Binary Patterns (compares value of central pixel to several neighbors and encodes the difference with 1 or 0 for above or beyond, then histogram)
- Histogram of LAB color (L for lightness a and b for color opponents green-red and blue-yellow)
-
Hand feature : manipulation point (point where the person is most likely to be manipulating an image), obtained from hand segmentation
-
Head feature : corresponds to camera motion
-
Gaze direction : 2D image point on each frame
-
Removing camera motion : subtract camera motion from dense optical flow. This produces better motion features and selects trajectories on foreground regions that move differently from camera motion
-
Trajectory selection : use local descriptors in vicinity of manipulation and gaze point
Extract set of local descriptors (HOG, LAB, LBP, ...) aggregated along trajectories
Trajectory is divided in 2x2x3 grids and histograms of features are concatenated within each grid
Encode descriptors using Improved Fischer Vector (encoded as mean and variance of gaussian mixture model)
Using object + motion + egocentric + manipulation point trajectory cues produces the best results
Obtained accuracies
- GTEA Gaze + : 60%
- GTEA with 17 or 61 classes : 60%
- GTEA Gaze with 25 classes : 60%, with 40 : 40%