A method for assessing the degree of confidence in the self-explanations of GPT models
News of the Kabardin-Balkar scientific center of RAS, Tome 26 (2024) no. 4, pp. 54-61
Voir la notice de l'article provenant de la source Math-Net.Ru
With the rapid growth in the use of generative neural network models for practical tasks,
the problem of explaining their decisions is becoming increasingly acute. As neural network-based
solutions are being introduced into medical practice, government administration, and defense, the
demands for interpretability of such systems will undoubtedly increase. In this study, we aim to
propose a method for verifying the reliability of self-explanations provided by models post factum by
comparing the attention distribution of the model during the generation of the response and its
explanation. The authors propose and develop methods for numerical evaluation of answers reliability
provided by generative pre-trained transformers. It is proposed to use the Kullback-Leibler divergence
over the attention distributions of the model during the issuance of the response and the subsequent
explanation. Additionally, it is proposed to compute the ratio of the model's attention between the
original query and the generated explanation to understand how much the self-explanation was
influenced by its own response. An algorithm for recursively computing the model's attention across
the generation steps is proposed to obtain these values. The study demonstrated the effectiveness of the
proposed methods, identifying metric values corresponding to correct and incorrect explanations and
responses. We analyzed the currently existing methods for determining the reliability of generative
model responses, noting that the overwhelming majority of them are challenging for an ordinary user to
interpret. In this regard, we proposed our own methods, testing them on the most widely used
generative models available at the time of writing. As a result, we obtained typical values for the
proposed metrics, an algorithm for their computation, and visualization.
Keywords:
neural networks, metrics, language models, interpretability, GPT, LLM
Mots-clés : XAI
Mots-clés : XAI
@article{IZKAB_2024_26_4_a2,
author = {A. N. Lukyanov and A. M. Tramova},
title = {A method for assessing the degree of confidence in the self-explanations of {GPT} models},
journal = {News of the Kabardin-Balkar scientific center of RAS},
pages = {54--61},
publisher = {mathdoc},
volume = {26},
number = {4},
year = {2024},
language = {ru},
url = {http://geodesic.mathdoc.fr/item/IZKAB_2024_26_4_a2/}
}
TY - JOUR AU - A. N. Lukyanov AU - A. M. Tramova TI - A method for assessing the degree of confidence in the self-explanations of GPT models JO - News of the Kabardin-Balkar scientific center of RAS PY - 2024 SP - 54 EP - 61 VL - 26 IS - 4 PB - mathdoc UR - http://geodesic.mathdoc.fr/item/IZKAB_2024_26_4_a2/ LA - ru ID - IZKAB_2024_26_4_a2 ER -
%0 Journal Article %A A. N. Lukyanov %A A. M. Tramova %T A method for assessing the degree of confidence in the self-explanations of GPT models %J News of the Kabardin-Balkar scientific center of RAS %D 2024 %P 54-61 %V 26 %N 4 %I mathdoc %U http://geodesic.mathdoc.fr/item/IZKAB_2024_26_4_a2/ %G ru %F IZKAB_2024_26_4_a2
A. N. Lukyanov; A. M. Tramova. A method for assessing the degree of confidence in the self-explanations of GPT models. News of the Kabardin-Balkar scientific center of RAS, Tome 26 (2024) no. 4, pp. 54-61. http://geodesic.mathdoc.fr/item/IZKAB_2024_26_4_a2/