Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F20%3A00114279" target="_blank" >RIV/00216224:14330/20:00114279 - isvavai.cz</a>
Výsledek na webu
<a href="https://aaai.org/ojs/index.php/AAAI/article/view/6531" target="_blank" >https://aaai.org/ojs/index.php/AAAI/article/view/6531</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1609/aaai.v34i06.6531" target="_blank" >10.1609/aaai.v34i06.6531</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes
Popis výsledku v původním jazyce
Markov decision processes (MDPs) are the defacto framework for sequential decision making in the presence of stochastic uncertainty. A classical optimization criterion for MDPs is to maximize the expected discounted-sum payoff, which ignores low probability catastrophic events with highly negative impact on the system. On the other hand, risk-averse policies require the probability of undesirable events to be below a given threshold, but they do not account for optimization of the expected payoff. We consider MDPs with discounted-sum payoff with failure states which represent catastrophic outcomes. The objective of risk-constrained planning is to maximize the expected discounted-sum payoff among risk-averse policies that ensure the probability to encounter a failure state is below a desired threshold. Our main contribution is an efficient risk-constrained planning algorithm that combines UCT-like search with a predictor learned through interaction with the MDP (in the style of AlphaZero) and with a risk-constrained action selection via linear programming. We demonstrate the effectiveness of our approach with experiments on classical MDPs from the literature, including benchmarks with an order of 10^6 states.
Název v anglickém jazyce
Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes
Popis výsledku anglicky
Markov decision processes (MDPs) are the defacto framework for sequential decision making in the presence of stochastic uncertainty. A classical optimization criterion for MDPs is to maximize the expected discounted-sum payoff, which ignores low probability catastrophic events with highly negative impact on the system. On the other hand, risk-averse policies require the probability of undesirable events to be below a given threshold, but they do not account for optimization of the expected payoff. We consider MDPs with discounted-sum payoff with failure states which represent catastrophic outcomes. The objective of risk-constrained planning is to maximize the expected discounted-sum payoff among risk-averse policies that ensure the probability to encounter a failure state is below a desired threshold. Our main contribution is an efficient risk-constrained planning algorithm that combines UCT-like search with a predictor learned through interaction with the MDP (in the style of AlphaZero) and with a risk-constrained action selection via linear programming. We demonstrate the effectiveness of our approach with experiments on classical MDPs from the literature, including benchmarks with an order of 10^6 states.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10200 - Computer and information sciences

Návaznosti výsledku

Projekt
Výsledek vznikl pri realizaci vícero projektů. Více informací v záložce Projekty.
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>S - Specificky vyzkum na vysokych skolach

Ostatní

Rok uplatnění
2020
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020
ISBN
9781577358237
ISSN
—
e-ISSN
—
Počet stran výsledku
8
Strana od-do
9794-9801
Název nakladatele
AAAI Press
Místo vydání
Palo Alto, California, USA
Místo konání akce
New York
Datum konání akce
7. 2. 2020
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

Unifying Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes Multiple-Environment Markov Decision Processes: Efficient Analysis and Applications

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)