ASAM: Asynchronous Self-Attention Model for Visual Question Answering
Computer Science and Information Systems, Tome 22 (2025) no. 1
Cet article a éte moissonné depuis la source Computer Science and Information Systems website
Visual Question Answering (VQA) is an emerging field of deep learning that combines image and question features and generates collaborative feature representations for classification by uniquely fusing the components. To enhance the effectiveness of models, it is crucial to fully utilize the semantic information from both text and vision. Some researchers have improved the accuracy of the model's training by either adding new features or enhancing the model’s ability to extract more detailed information. However, these methods have made experimentation more challenging and expensive. We propose a model called asynchronous selfattention model (ASAM) that makes use of an asynchronous self-attention component and a controller, integrating the asynchronous self-attention mechanism and collaborative attention mechanism effectively to leverage the rich semantic information of the underlying visuals. It realizes an end-to-end training framework that can extract and exploit the rich representational information of the underlying visual images while performing coordinated attention with text features, as it does not over-emphasize fine-grained but finds a balance within it, thus allowing the model to learn more valuable information. Extensive ablation experiments were conducted on the proposed ASAM using the VQA v2 dataset to verify its effectiveness. The results of the experiments demonstrate that the proposed model outperforms other state-of-the-art models, without increasing the model complexity and the number of parameters.
Keywords:
Visual Question Answering, Asynchronous Self-Attention, Deep Collaborative, Controller
@article{CSIS_2025_22_1_a9,
author = {Han Liu and Dezhi Han and Shukai Zhang and Jingya Shi and Huafeng Wu and Yachao Zhou and and Kuan-Ching Li},
title = {ASAM: {Asynchronous} {Self-Attention} {Model} for {Visual} {Question} {Answering}},
journal = {Computer Science and Information Systems},
year = {2025},
volume = {22},
number = {1},
url = {http://geodesic.mathdoc.fr/item/CSIS_2025_22_1_a9/}
}
TY - JOUR AU - Han Liu AU - Dezhi Han AU - Shukai Zhang AU - Jingya Shi AU - Huafeng Wu AU - Yachao Zhou AU - and Kuan-Ching Li TI - ASAM: Asynchronous Self-Attention Model for Visual Question Answering JO - Computer Science and Information Systems PY - 2025 VL - 22 IS - 1 UR - http://geodesic.mathdoc.fr/item/CSIS_2025_22_1_a9/ ID - CSIS_2025_22_1_a9 ER -
%0 Journal Article %A Han Liu %A Dezhi Han %A Shukai Zhang %A Jingya Shi %A Huafeng Wu %A Yachao Zhou %A and Kuan-Ching Li %T ASAM: Asynchronous Self-Attention Model for Visual Question Answering %J Computer Science and Information Systems %D 2025 %V 22 %N 1 %U http://geodesic.mathdoc.fr/item/CSIS_2025_22_1_a9/ %F CSIS_2025_22_1_a9
Han Liu; Dezhi Han; Shukai Zhang; Jingya Shi; Huafeng Wu; Yachao Zhou; and Kuan-Ching Li. ASAM: Asynchronous Self-Attention Model for Visual Question Answering. Computer Science and Information Systems, Tome 22 (2025) no. 1. http://geodesic.mathdoc.fr/item/CSIS_2025_22_1_a9/