Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F20%3A00114279" target="_blank" >RIV/00216224:14330/20:00114279 - isvavai.cz</a>
Výsledek na webu
<a href="https://aaai.org/ojs/index.php/AAAI/article/view/6531" target="_blank" >https://aaai.org/ojs/index.php/AAAI/article/view/6531</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1609/aaai.v34i06.6531" target="_blank" >10.1609/aaai.v34i06.6531</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes
Popis výsledku v původním jazyce
Markov decision processes (MDPs) are the defacto framework for sequential decision making in the presence of stochastic uncertainty. A classical optimization criterion for MDPs is to maximize the expected discounted-sum payoff, which ignores low probability catastrophic events with highly negative impact on the system. On the other hand, risk-averse policies require the probability of undesirable events to be below a given threshold, but they do not account for optimization of the expected payoff. We consider MDPs with discounted-sum payoff with failure states which represent catastrophic outcomes. The objective of risk-constrained planning is to maximize the expected discounted-sum payoff among risk-averse policies that ensure the probability to encounter a failure state is below a desired threshold. Our main contribution is an efficient risk-constrained planning algorithm that combines UCT-like search with a predictor learned through interaction with the MDP (in the style of AlphaZero) and with a risk-constrained action selection via linear programming. We demonstrate the effectiveness of our approach with experiments on classical MDPs from the literature, including benchmarks with an order of 10^6 states.
Název v anglickém jazyce
Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes
Popis výsledku anglicky
Markov decision processes (MDPs) are the defacto framework for sequential decision making in the presence of stochastic uncertainty. A classical optimization criterion for MDPs is to maximize the expected discounted-sum payoff, which ignores low probability catastrophic events with highly negative impact on the system. On the other hand, risk-averse policies require the probability of undesirable events to be below a given threshold, but they do not account for optimization of the expected payoff. We consider MDPs with discounted-sum payoff with failure states which represent catastrophic outcomes. The objective of risk-constrained planning is to maximize the expected discounted-sum payoff among risk-averse policies that ensure the probability to encounter a failure state is below a desired threshold. Our main contribution is an efficient risk-constrained planning algorithm that combines UCT-like search with a predictor learned through interaction with the MDP (in the style of AlphaZero) and with a risk-constrained action selection via linear programming. We demonstrate the effectiveness of our approach with experiments on classical MDPs from the literature, including benchmarks with an order of 10^6 states.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10200 - Computer and information sciences
Návaznosti výsledku
Projekt
Výsledek vznikl pri realizaci vícero projektů. Více informací v záložce Projekty.
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>S - Specificky vyzkum na vysokych skolach
Ostatní
Rok uplatnění
2020
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020
ISBN
9781577358237
ISSN
—
e-ISSN
—
Počet stran výsledku
8
Strana od-do
9794-9801
Název nakladatele
AAAI Press
Místo vydání
Palo Alto, California, USA
Místo konání akce
New York
Datum konání akce
7. 2. 2020
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—