CMU Informedia at TRECVID 2021: Activity Detection with Argus++

Lijun Yu, Yijun Qian, Wenhe Liu, Alexander G. Hauptmann

Published in TRECVID, 2021


In TRECVID 2021 ActEV task, we tackle multi-scale multi-instance activity detection in extended videos with the CMU Argus++ video understanding framework. The proposed method was the first to generate overlapping spatio-temporal cube proposals to ensure the coverage of activities in untrimmed video streams, instead of conventional non-overlapping cube or tube proposals. The well-designed four-stage framework achieves an ideal trade-off between computation cost and performance, achieving state-of-the-art performance within real-time on consumer-level hardware. The proposed system achieved the best performance on the TRECVID 2021 challenge live leaderboard 1, where efficiency measurement and execution verification were missing but would be desired. We further evaluate our method on the ActEV SDL benchmark series with fully-sequestered data and ready-to-run system submission. The outstanding performance of Argus++ again emphasizes its robustness and superiority across a wide range of benchmarks where it has been leading for years.

@inproceedings{yucmu2021, title={CMU Informedia at TRECVID 2021: Activity Detection with Argus+}, author={Yu, Lijun and Qian, Yijun and Liu, Wenhe and Hauptmann, Alexander G}, booktitle={TRECVID}, year={2021} }