Second Order Optimality in Markov Decision Chains

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F67985556%3A_____%2F17%3A00485146" target="_blank" >RIV/67985556:_____/17:00485146 - isvavai.cz</a>
Výsledek na webu
<a href="http://dx.doi.org/10.14736/kyb-2017-6-1086" target="_blank" >http://dx.doi.org/10.14736/kyb-2017-6-1086</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.14736/kyb-2017-6-1086" target="_blank" >10.14736/kyb-2017-6-1086</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Second Order Optimality in Markov Decision Chains
Popis výsledku v původním jazyce
The article is devoted to Markov reward chains in discrete-time setting with finite state spaces. Unfortunately, the usual optimization criteria examined in the literature on Markov decision chains, such as a total discounted, total reward up to reaching some specific state (called the first passage models) or mean (average) reward optimality, may be quite insufficient to characterize the problem from the point of a decision maker. To this end it seems that it may be preferable if not necessary to select more sophisticated criteria that also reflect variability -risk features of the problem. Perhaps the best known approaches stem from the classical work of Markowitz on mean variance selection rules, i.e. we optimize the weighted sum of average or total reward and its variance. The article presents explicit formulae for calculating the variances for transient and discounted models (where the value of the discount factor depends on the current state and action taken) for finite and infinite time horizon. The same result is presented for the long run average nondiscounted models where finding stationary policies minimizing the average variance in the class of policies with a given long run average reward is discussed.
Název v anglickém jazyce
Second Order Optimality in Markov Decision Chains
Popis výsledku anglicky
The article is devoted to Markov reward chains in discrete-time setting with finite state spaces. Unfortunately, the usual optimization criteria examined in the literature on Markov decision chains, such as a total discounted, total reward up to reaching some specific state (called the first passage models) or mean (average) reward optimality, may be quite insufficient to characterize the problem from the point of a decision maker. To this end it seems that it may be preferable if not necessary to select more sophisticated criteria that also reflect variability -risk features of the problem. Perhaps the best known approaches stem from the classical work of Markowitz on mean variance selection rules, i.e. we optimize the weighted sum of average or total reward and its variance. The article presents explicit formulae for calculating the variances for transient and discounted models (where the value of the discount factor depends on the current state and action taken) for finite and infinite time horizon. The same result is presented for the long run average nondiscounted models where finding stationary policies minimizing the average variance in the class of policies with a given long run average reward is discussed.

Klasifikace

Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10103 - Statistics and probability

Návaznosti výsledku

Projekt
<a href="/cs/project/GA15-10331S" target="_blank" >GA15-10331S: Dynamické modely rizika portfolia hypoték</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2017
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
Kybernetika
ISSN
0023-5954
e-ISSN
—
Svazek periodika
53
Číslo periodika v rámci svazku
6
Stát vydavatele periodika
CZ - Česká republika
Počet stran výsledku
14
Strana od-do
1086-1099
Kód UT WoS článku
000424732300008
EID výsledku v databázi Scopus
2-s2.0-85040739483

Podobné výsledky(10)

Second Order Optimality in Markov and Semi-Markov Decision Processes Second Order Optimality in Transient and Discounted Markov Decision Chains Markov decision chains in discrete- and continuous-time; a unified approach

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Second Order Optimality in Markov Decision Chains

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)