CTA-Net: A Gaze Estimation network based on Dual Feature Aggregation and Attention Cross Fusion
Computer Science and Information Systems, Tome 21 (2024) no. 3.

Voir la notice de l'article provenant de la source Computer Science and Information Systems website

Recent work has demonstrated the Transformer model is effective for computer vision tasks. However, the global self-attention mechanism utilized in Transformer models does not adequately consider the local structure and details of images, which may result in the loss of information and local details, causing decreased estimation accuracy in gaze estimation tasks when compared to convolution or sequential stacking methods. To address this issue, we propose a parallel CNNs-Transformer aggregation network (CTA-Net) for gaze estimation, which fully leverages the advantages of the Transformer model in modeling global context while the convolutional neural networks (CNNs) model in retaining local details. Specifically, Transformer and ResNet are deployed to extract facial and eye information, respectively. Additionally, an attention cross fusion (ACFusion) Block is embedded with CNN branch, which decomposes features in space and channels to supplement lost features, suppress noise, and help extract eye features more effectively. Finally, a dual-feature aggregation (DFA) module is proposed to effectively fuse the output features of both branches with the help feature a selection mechanism and a residual structure. Experimental results on the MPIIGaze and Gaze360 datasets demonstrate that our CTA-Net achieves state-of-the-art results.
Keywords: Appearance-based gaze estimation, Deep neural networks, Dilated convolution, Fusion, Transformer
@article{CSIS_2024_21_3_a8,
     author = {Chenxing Xia and Zhanpeng Tao and Wei Wang and Wenjun Zhao and Bin Ge and Xiuju Gao and Kuan-Ching Li and Yan Zhang},
     title = {CTA-Net: {A} {Gaze} {Estimation} network based on {Dual} {Feature} {Aggregation} and {Attention} {Cross} {Fusion}},
     journal = {Computer Science and Information Systems},
     publisher = {mathdoc},
     volume = {21},
     number = {3},
     year = {2024},
     url = {http://geodesic.mathdoc.fr/item/CSIS_2024_21_3_a8/}
}
TY  - JOUR
AU  - Chenxing Xia
AU  - Zhanpeng Tao
AU  - Wei Wang
AU  - Wenjun Zhao
AU  - Bin Ge
AU  - Xiuju Gao
AU  - Kuan-Ching Li
AU  - Yan Zhang
TI  - CTA-Net: A Gaze Estimation network based on Dual Feature Aggregation and Attention Cross Fusion
JO  - Computer Science and Information Systems
PY  - 2024
VL  - 21
IS  - 3
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/CSIS_2024_21_3_a8/
ID  - CSIS_2024_21_3_a8
ER  - 
%0 Journal Article
%A Chenxing Xia
%A Zhanpeng Tao
%A Wei Wang
%A Wenjun Zhao
%A Bin Ge
%A Xiuju Gao
%A Kuan-Ching Li
%A Yan Zhang
%T CTA-Net: A Gaze Estimation network based on Dual Feature Aggregation and Attention Cross Fusion
%J Computer Science and Information Systems
%D 2024
%V 21
%N 3
%I mathdoc
%U http://geodesic.mathdoc.fr/item/CSIS_2024_21_3_a8/
%F CSIS_2024_21_3_a8
Chenxing Xia; Zhanpeng Tao; Wei Wang; Wenjun Zhao; Bin Ge; Xiuju Gao; Kuan-Ching Li; Yan Zhang. CTA-Net: A Gaze Estimation network based on Dual Feature Aggregation and Attention Cross Fusion. Computer Science and Information Systems, Tome 21 (2024) no. 3. http://geodesic.mathdoc.fr/item/CSIS_2024_21_3_a8/