Spatio-Temporal-based Multi-level Aggregation Network for Physical Action Recognition
Computer Science and Information Systems, Tome 21 (2024) no. 4
Cet article a éte moissonné depuis la source Computer Science and Information Systems website
This paper introduces spatio-temporal-based multi-level aggregation network (ST-MANet) for action recognition. It utilizes the correlations between different spatial positions and the correlations between different temporal positions on the feature map to explore long-range spatial and temporal dependencies, respectively, generating the spatial and temporal attention map that assigns different weights to features at different spatial and temporal locations. Additionally, a multi-scale approach is introduced, proposing a multi-scale behavior recognition framework that models various visual rhythms while capturing multi-scale spatiotemporal information. A spatial diversity constraint is then proposed, encouraging spatial attention maps at different scales to focus on distinct areas. This ensures a greater emphasis on spatial information unique to each scale, thereby incorporating more diverse spatial information into multi-scale features. Finally, ST-MANet is compared with existing approaches, demonstrating high accuracy on the three datasets.
Keywords:
Action recognition, spatial and temporal attention, multi-level aggregation network
@article{CSIS_2024_21_4_a31,
author = {Yuhang Wang},
title = {Spatio-Temporal-based {Multi-level} {Aggregation} {Network} for {Physical} {Action} {Recognition}},
journal = {Computer Science and Information Systems},
year = {2024},
volume = {21},
number = {4},
url = {http://geodesic.mathdoc.fr/item/CSIS_2024_21_4_a31/}
}
Yuhang Wang. Spatio-Temporal-based Multi-level Aggregation Network for Physical Action Recognition. Computer Science and Information Systems, Tome 21 (2024) no. 4. http://geodesic.mathdoc.fr/item/CSIS_2024_21_4_a31/