Nonrandomized Markov and semi-Markov policies in dynamic programming
    
    
  
  
  
      
      
      
        
Teoriâ veroâtnostej i ee primeneniâ, Tome 27 (1982) no. 1, pp. 109-119
    
  
  
  
  
  
    
      
      
        
      
      
      
    Voir la notice de l'article provenant de la source Math-Net.Ru
            
              			The discrete time infinite horizon Borel state and action spaces non-stationary Markov decision model with the expected total reward criterion is considered. For an arbitrary fixed policy $\pi$ the following two statements are proved:
a) for an arbitrary initial measure $\mu$ and for a constant $K\infty$ there exists a nonrandomized Markov policy $\varphi$ such that
\begin{gather*}
w(\mu,\varphi)\ge w(\mu,\pi)\ \text{if}\ w(\mu,\pi)\infty,
\\
w(\mu,\varphi)\ge K\ \text{if}\ w(\mu,\pi)=\infty,
\end{gather*} b) for an arbitrary measurable function $K(x)\infty$ on the initial state space $X_0$ there exists a nonrandomized semi-Markov policy $\varphi'$ such that
\begin{gather*}
w(x,\varphi')\ge w(x,\pi)\ \text{if}\ w(x,\pi)\infty,
\\
w(x,\varphi')\ge K(x)\ \text{if}\ w(x,\pi)=\infty\ \text{for every}\ x\in X_0.
\end{gather*}
For every policy $\sigma$ the numbers $w(\mu,\sigma)$ and $w(x,\sigma)$ are the values of the criterion for the initial measure $\mu$ and the initial state $x$ respectively.
			
            
            
            
          
        
      @article{TVP_1982_27_1_a9,
     author = {E. A. Faǐnberg},
     title = {Nonrandomized {Markov} and {semi-Markov} policies in dynamic programming},
     journal = {Teori\^a vero\^atnostej i ee primeneni\^a},
     pages = {109--119},
     publisher = {mathdoc},
     volume = {27},
     number = {1},
     year = {1982},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/TVP_1982_27_1_a9/}
}
                      
                      
                    E. A. Faǐnberg. Nonrandomized Markov and semi-Markov policies in dynamic programming. Teoriâ veroâtnostej i ee primeneniâ, Tome 27 (1982) no. 1, pp. 109-119. http://geodesic.mathdoc.fr/item/TVP_1982_27_1_a9/
