Skip to content

1805.07694.md

Yana edited this page May 29, 2020 · 1 revision
Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition, CVPR'19, {paper} {code} {notes}

Lei Shi, Yifan Zhang, Jian Cheng, Hanq Lu

Objective

Explore data-dependent graphs for pose-based action recognition (in constrast to fixed graphs, for instance mirroring the kinematic structure of the human body).

Exploit length and direction of bones as a signal in addition to keypoint positions by encoding it as a vector from the source to the target joint.

Method

Based on ST-GCN.

ST-GCN:

  • defines a spatiotemporal graph (each node contains 2D or 3D position information as its value, and is connecte to same joint in direct future and past, as well as parent and children in kinematic tree)
  • several convolutional layers, +average pooling and softmax to produce classification scores

Technical details

Clips are repeated to reach a fixed size 300 frames

Handle multi-people, actually exactly 2 persons, if absent person (only one person visible) --> use zeros instead

Data-augmentation:

  • randomly choose 150 frames from the input skeleton sequence
  • slightly disturb the joint coordinates with randomly chosen rotations and translations

Experimental results

Evaluate on NTU-RGBD and kinetics-skeletons

  • 7% on both datasets compared to ST-GCN

(+5% even if single-stream on keypoints on NTU-RGBD)

Clone this wiki locally