Adaptive Feature Aggregation for Video Object Detection

Yijun Qian, Lijun Yu, Wenhe Liu, Guoliang Kang, Alexander G. Hauptmann

Published in WACVW, 2020


Object detection, as a fundamental research topic of computer vision, is facing the challenges of video-related tasks. Objects in videos tend to be blurred, occluded, or out of focus more frequently. Existing works adopt feature aggregation and enhancement to design video-based object detectors. However, most of them do not consider the diversity of object movements and the quality of aggregated context features. Thus, they can not generate comparable results given blurred or crowded videos. In this paper, we propose an adaptive feature aggregation method for video object detection to deal with these problems. We introduce an adaptive quality-similarity weight, with a sparse and dense temporal aggregation policy, into our model. Compared with both image-based and video-based baselines on ImageNet and VIRAT datasets, our work consistently demonstrates better performance. Especially, our model improves the average precision of person detection in VIRAT from 85.93 to 87.21.

  title={Adaptive Feature Aggregation for Video Object Detection},
  author={Qian, Yijun and Yu, Lijun and Liu, Wenhe and Kang, Guoliang and Hauptmann, Alexander G.},