Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F16%3APU121655" target="_blank" >RIV/00216305:26230/16:PU121655 - isvavai.cz</a>
Výsledek na webu
<a href="http://link.springer.com/chapter/10.1007%2F978-3-319-46520-3_2" target="_blank" >http://link.springer.com/chapter/10.1007%2F978-3-319-46520-3_2</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-319-46520-3_2" target="_blank" >10.1007/978-3-319-46520-3_2</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations
Popis výsledku v původním jazyce
We consider the problem of finding an optimal policy in a Markov decision process that maximises the expected discounted sum of rewards over an infinite time horizon. Since the explicit iterative dynamical programming scheme does not scale when increasing the dimension of the state space, a number of approximate methods have been developed. These are typically based on value or policy iteration, enabling further speedups through lumped and distributed updates, or by employing succinct representations of the value functions. However, none of the existing approximate techniques provides general, explicit and tunable bounds on the approximation error, a problem particularly relevant when the level of accuracy affects the optimality of the policy. In this paper we propose a new approximate policy iteration scheme that mitigates the state-space explosion problem by adaptive state-space aggregation, at the same time providing rigorous and explicit error bounds that can be used to control the optimality level of the obtained policy. We evaluate the new approach on a case study, demonstrating evidence that the state-space reduction results in considerable acceleration of the policy iteration scheme, while being able to meet the required level of precision.
Název v anglickém jazyce
Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations
Popis výsledku anglicky
We consider the problem of finding an optimal policy in a Markov decision process that maximises the expected discounted sum of rewards over an infinite time horizon. Since the explicit iterative dynamical programming scheme does not scale when increasing the dimension of the state space, a number of approximate methods have been developed. These are typically based on value or policy iteration, enabling further speedups through lumped and distributed updates, or by employing succinct representations of the value functions. However, none of the existing approximate techniques provides general, explicit and tunable bounds on the approximation error, a problem particularly relevant when the level of accuracy affects the optimality of the policy. In this paper we propose a new approximate policy iteration scheme that mitigates the state-space explosion problem by adaptive state-space aggregation, at the same time providing rigorous and explicit error bounds that can be used to control the optimality level of the obtained policy. We evaluate the new approach on a case study, demonstrating evidence that the state-space reduction results in considerable acceleration of the policy iteration scheme, while being able to meet the required level of precision.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
<a href="/cs/project/GA16-17538S" target="_blank" >GA16-17538S: Přibližná ekvivalence pro aproximativní počítání</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2016
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Proceedings of 14th International Symposium on Automated Technology for Verification and Analysis
ISBN
978-3-319-46519-7
ISSN
—
e-ISSN
—
Počet stran výsledku
16
Strana od-do
13-31
Název nakladatele
Springer Verlag
Místo vydání
Heidelberg
Místo konání akce
Chiba
Datum konání akce
17. 10. 2016
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
000389808100002

Podobné výsledky(10)

Adaptive Aggregation of Markov Chains: Quantitative Analysis of Chemical Reaction Networks Adaptive formal approximations of Markov chains Identification of Optimal Policies in Markov Decision Processes

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)