Rethinking Zero-shot Action Recognition: Learning from Latent Atomic Actions

Yijun Qian, Lijun Yu, Wenhe Liu, Alexander G. Hauptmann

Published in ECCV, 2022

Springer ECVA

To avoid time-consuming annotating and retraining cycle in applying supervised action recognition models, Zero-Shot Action Recognition (ZSAR) has become a thriving direction. ZSAR requires models to recognize actions that never appear in training set through bridging visual features and semantic representations. However, due to the complexity of actions, it remains challenging to transfer knowledge learned from source to target action domains. Previous ZSAR methods mainly focus on mitigating representation variance between source and target actions through integrating or applying new action-level features. However, the action-level features are coarse-grained and make the learned one-to-one bridge fragile to similar target actions. Meanwhile, integration or application of features usually requires extra computation or annotation. These methods didn’t notice that two actions with different names may still share the same atomic action components. It enables humans to quickly understand an unseen action given bunch of atomic actions learned from seen actions. Inspired by this, we propose Jigsaw Network (JigsawNet) which recognizes complex actions through unsu- pervisedly decomposing them into combinations of atomic actions and bridging group to group relationships between visual features and semantic representations. To enhance the robustness of learned group-to-group bridge, we propose Group Excitation (GE) module to model intra-sample knowledge and Consistency Loss to enforce the model learn from inter-sample knowledge. Our JigsawNet achieves state-of-the-art performance on three benchmarks and surpasses previous works with noticeable margins.

  title={Rethinking Zero-shot Action Recognition: Learning from Latent Atomic Actions},
  author={Qian, Yijun and Yu, Lijun and Liu, Wenhe and Hauptmann, Alexander G},