Bi-personal stochastic transient Markov games with stopping times and total reward criterion
Kybernetika, Tome 57 (2021) no. 1, pp. 1-14.

Voir la notice de l'article provenant de la source Czech Digital Mathematics Library

The article is devoted to a class of Bi-personal (players 1 and 2), zero-sum Markov games evolving in discrete-time on Transient Markov reward chains. At each decision time the second player can stop the system by paying terminal reward to the first player. If the system is not stopped the first player selects a decision and two things will happen: The Markov chain reaches next state according to the known transition law, and the second player must pay a reward to the first player. The first player (resp. the second player) tries to maximize (resp. minimize) his total expected reward (resp. cost). Observe that if the second player is dummy, the problem is reduced to finding optimal policy of a transient Markov reward chain. Contraction properties of the transient model enable to apply the Banach Fixed Point Theorem and establish the Nash Equilibrium. The obtained results are illustrated on two numerical examples.
DOI : 10.14736/kyb-2021-1-0001
Classification : 91A05, 91A50
Keywords: two-person Markov games; stopping times; stopping times in transient Markov decision chains; transient and communicating Markov chains
@article{10_14736_kyb_2021_1_0001,
     author = {Mart{\'\i}nez-Cort\'es, Victor Manuel},
     title = {Bi-personal stochastic transient {Markov} games with stopping times and total reward criterion},
     journal = {Kybernetika},
     pages = {1--14},
     publisher = {mathdoc},
     volume = {57},
     number = {1},
     year = {2021},
     doi = {10.14736/kyb-2021-1-0001},
     mrnumber = {4231853},
     zbl = {07396252},
     language = {en},
     url = {http://geodesic.mathdoc.fr/articles/10.14736/kyb-2021-1-0001/}
}
TY  - JOUR
AU  - Martínez-Cortés, Victor Manuel
TI  - Bi-personal stochastic transient Markov games with stopping times and total reward criterion
JO  - Kybernetika
PY  - 2021
SP  - 1
EP  - 14
VL  - 57
IS  - 1
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/articles/10.14736/kyb-2021-1-0001/
DO  - 10.14736/kyb-2021-1-0001
LA  - en
ID  - 10_14736_kyb_2021_1_0001
ER  - 
%0 Journal Article
%A Martínez-Cortés, Victor Manuel
%T Bi-personal stochastic transient Markov games with stopping times and total reward criterion
%J Kybernetika
%D 2021
%P 1-14
%V 57
%N 1
%I mathdoc
%U http://geodesic.mathdoc.fr/articles/10.14736/kyb-2021-1-0001/
%R 10.14736/kyb-2021-1-0001
%G en
%F 10_14736_kyb_2021_1_0001
Martínez-Cortés, Victor Manuel. Bi-personal stochastic transient Markov games with stopping times and total reward criterion. Kybernetika, Tome 57 (2021) no. 1, pp. 1-14. doi : 10.14736/kyb-2021-1-0001. http://geodesic.mathdoc.fr/articles/10.14736/kyb-2021-1-0001/

Cité par Sources :