ASAM: Asynchronous Self-Attention Model for Visual Question Answering
Computer Science and Information Systems, Tome 22 (2025) no. 1.

Voir la notice de l'article provenant de la source Computer Science and Information Systems website

Visual Question Answering (VQA) is an emerging field of deep learning that combines image and question features and generates collaborative feature representations for classification by uniquely fusing the components. To enhance the effectiveness of models, it is crucial to fully utilize the semantic information from both text and vision. Some researchers have improved the accuracy of the model's training by either adding new features or enhancing the model’s ability to extract more detailed information. However, these methods have made experimentation more challenging and expensive. We propose a model called asynchronous selfattention model (ASAM) that makes use of an asynchronous self-attention component and a controller, integrating the asynchronous self-attention mechanism and collaborative attention mechanism effectively to leverage the rich semantic information of the underlying visuals. It realizes an end-to-end training framework that can extract and exploit the rich representational information of the underlying visual images while performing coordinated attention with text features, as it does not over-emphasize fine-grained but finds a balance within it, thus allowing the model to learn more valuable information. Extensive ablation experiments were conducted on the proposed ASAM using the VQA v2 dataset to verify its effectiveness. The results of the experiments demonstrate that the proposed model outperforms other state-of-the-art models, without increasing the model complexity and the number of parameters.
Keywords: Visual Question Answering, Asynchronous Self-Attention, Deep Collaborative, Controller
@article{CSIS_2025_22_1_a9,
     author = {Han Liu and Dezhi Han and Shukai Zhang and Jingya Shi and Huafeng Wu and Yachao Zhou and and Kuan-Ching Li},
     title = {ASAM: {Asynchronous} {Self-Attention} {Model} for {Visual} {Question} {Answering}},
     journal = {Computer Science and Information Systems},
     publisher = {mathdoc},
     volume = {22},
     number = {1},
     year = {2025},
     url = {http://geodesic.mathdoc.fr/item/CSIS_2025_22_1_a9/}
}
TY  - JOUR
AU  - Han Liu
AU  - Dezhi Han
AU  - Shukai Zhang
AU  - Jingya Shi
AU  - Huafeng Wu
AU  - Yachao Zhou
AU  - and Kuan-Ching Li
TI  - ASAM: Asynchronous Self-Attention Model for Visual Question Answering
JO  - Computer Science and Information Systems
PY  - 2025
VL  - 22
IS  - 1
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/CSIS_2025_22_1_a9/
ID  - CSIS_2025_22_1_a9
ER  - 
%0 Journal Article
%A Han Liu
%A Dezhi Han
%A Shukai Zhang
%A Jingya Shi
%A Huafeng Wu
%A Yachao Zhou
%A and Kuan-Ching Li
%T ASAM: Asynchronous Self-Attention Model for Visual Question Answering
%J Computer Science and Information Systems
%D 2025
%V 22
%N 1
%I mathdoc
%U http://geodesic.mathdoc.fr/item/CSIS_2025_22_1_a9/
%F CSIS_2025_22_1_a9
Han Liu; Dezhi Han; Shukai Zhang; Jingya Shi; Huafeng Wu; Yachao Zhou; and Kuan-Ching Li. ASAM: Asynchronous Self-Attention Model for Visual Question Answering. Computer Science and Information Systems, Tome 22 (2025) no. 1. http://geodesic.mathdoc.fr/item/CSIS_2025_22_1_a9/