Uniqueness of optimal policies as a generic property of discounted Markov decision processes: Ekeland's variational principle approach
Kybernetika, Tome 52 (2016) no. 1, pp. 66-75
Cet article a éte moissonné depuis la source Czech Digital Mathematics Library
Many examples in optimization, ranging from Linear Programming to Markov Decision Processes (MDPs), present more than one optimal solution. The study of this non-uniqueness is of great mathematical interest. In this paper the authors show that in a specific family of discounted MDPs, non-uniqueness is a “fragile” property through Ekeland's Principle for each problem with at least two optimal policies; a perturbed model is produced with a unique optimal policy. This result not only supersedes previous papers on the subject, but it also renews the interest in the corresponding questions of well-posedness, genericity and structural stability of MDPs.
Many examples in optimization, ranging from Linear Programming to Markov Decision Processes (MDPs), present more than one optimal solution. The study of this non-uniqueness is of great mathematical interest. In this paper the authors show that in a specific family of discounted MDPs, non-uniqueness is a “fragile” property through Ekeland's Principle for each problem with at least two optimal policies; a perturbed model is produced with a unique optimal policy. This result not only supersedes previous papers on the subject, but it also renews the interest in the corresponding questions of well-posedness, genericity and structural stability of MDPs.
DOI :
10.14736/kyb-2016-1-0066
Classification :
90C40, 93E20
Keywords: discounted Markov decision processes; dynamic programming; unique optimal policy; non-uniqueness of optimal policies; Ekeland's variational principle
Keywords: discounted Markov decision processes; dynamic programming; unique optimal policy; non-uniqueness of optimal policies; Ekeland's variational principle
@article{10_14736_kyb_2016_1_0066,
author = {Ortega-Guti\'errez, R. Israel and Montes-de-Oca, Ra\'ul and Lemus-Rodr{\'\i}guez, Enrique},
title = {Uniqueness of optimal policies as a generic property of discounted {Markov} decision processes: {Ekeland's} variational principle approach},
journal = {Kybernetika},
pages = {66--75},
year = {2016},
volume = {52},
number = {1},
doi = {10.14736/kyb-2016-1-0066},
mrnumber = {3482611},
zbl = {1374.90407},
language = {en},
url = {http://geodesic.mathdoc.fr/articles/10.14736/kyb-2016-1-0066/}
}
TY - JOUR AU - Ortega-Gutiérrez, R. Israel AU - Montes-de-Oca, Raúl AU - Lemus-Rodríguez, Enrique TI - Uniqueness of optimal policies as a generic property of discounted Markov decision processes: Ekeland's variational principle approach JO - Kybernetika PY - 2016 SP - 66 EP - 75 VL - 52 IS - 1 UR - http://geodesic.mathdoc.fr/articles/10.14736/kyb-2016-1-0066/ DO - 10.14736/kyb-2016-1-0066 LA - en ID - 10_14736_kyb_2016_1_0066 ER -
%0 Journal Article %A Ortega-Gutiérrez, R. Israel %A Montes-de-Oca, Raúl %A Lemus-Rodríguez, Enrique %T Uniqueness of optimal policies as a generic property of discounted Markov decision processes: Ekeland's variational principle approach %J Kybernetika %D 2016 %P 66-75 %V 52 %N 1 %U http://geodesic.mathdoc.fr/articles/10.14736/kyb-2016-1-0066/ %R 10.14736/kyb-2016-1-0066 %G en %F 10_14736_kyb_2016_1_0066
Ortega-Gutiérrez, R. Israel; Montes-de-Oca, Raúl; Lemus-Rodríguez, Enrique. Uniqueness of optimal policies as a generic property of discounted Markov decision processes: Ekeland's variational principle approach. Kybernetika, Tome 52 (2016) no. 1, pp. 66-75. doi: 10.14736/kyb-2016-1-0066
Cité par Sources :