Double-Layer Affective Visual Question Answering Network
Computer Science and Information Systems, Tome 18 (2021) no. 1.

Voir la notice de l'article provenant de la source Computer Science and Information Systems website

Visual Question Answering (VQA) has attracted much attention recently in both natural language processing and computer vision communities, as it offers insight into the relationships between two relevant sources of information. Tremendous advances are seen in the field of VQA due to the success of deep learning. Based upon advances and improvements, the Affective Visual Question Answering Network (AVQAN) enriches the understanding and analysis of VQA models by making use of the emotional information contained in the images to produce sensitive answers, while maintaining the same level of accuracy as ordinary VQA baseline models. It is a reasonably new task to integrate the emotional information contained in the images into VQA. However, it is challenging to separate question-guided-attention from mood-guided-attention due to the concatenation of the question words and the mood labels in AVQAN. Also, it is believed that this type of concatenation is harmful to the performance of the model. To mitigate such an effect, we propose the Double-Layer Affective Visual Question Answering Network (DAVQAN) that divides the task of generating emotional answers in VQA into two simpler subtasks: the generation of non-emotional responses and the production of mood labels, and two independent layers are utilized to tackle these subtasks. Comparative experimentation conducted on a preprocessed dataset to performance comparison shows that the overall performance of DAVQAN is 7.6% higher than AVQAN, demonstrating the effectiveness of the proposed model. We also introduce more advanced word embedding method and more fine-grained image feature extractor into AVQAN and DAVQAN to further improve their performance and obtain better results than their original models, which proves that VQA integrated with affective computing can improve the performance of the whole model by improving these two modules just like the general VQA.
Keywords: deep learning, natural language processing, computer vision, visual question answering, affective computing
@article{CSIS_2021_18_1_a8,
     author = {Zihan Guo and Dezhi Han and Kuan-Ching Li},
     title = {Double-Layer {Affective} {Visual} {Question} {Answering} {Network}},
     journal = {Computer Science and Information Systems},
     publisher = {mathdoc},
     volume = {18},
     number = {1},
     year = {2021},
     url = {http://geodesic.mathdoc.fr/item/CSIS_2021_18_1_a8/}
}
TY  - JOUR
AU  - Zihan Guo
AU  - Dezhi Han
AU  - Kuan-Ching Li
TI  - Double-Layer Affective Visual Question Answering Network
JO  - Computer Science and Information Systems
PY  - 2021
VL  - 18
IS  - 1
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/CSIS_2021_18_1_a8/
ID  - CSIS_2021_18_1_a8
ER  - 
%0 Journal Article
%A Zihan Guo
%A Dezhi Han
%A Kuan-Ching Li
%T Double-Layer Affective Visual Question Answering Network
%J Computer Science and Information Systems
%D 2021
%V 18
%N 1
%I mathdoc
%U http://geodesic.mathdoc.fr/item/CSIS_2021_18_1_a8/
%F CSIS_2021_18_1_a8
Zihan Guo; Dezhi Han; Kuan-Ching Li. Double-Layer Affective Visual Question Answering Network. Computer Science and Information Systems, Tome 18 (2021) no. 1. http://geodesic.mathdoc.fr/item/CSIS_2021_18_1_a8/