Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F18%3A00322882" target="_blank" >RIV/68407700:21230/18:00322882 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.ijcai.org/proceedings/2018/662" target="_blank" >https://www.ijcai.org/proceedings/2018/662</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.24963/ijcai.2018/662" target="_blank" >10.24963/ijcai.2018/662</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs
Popis výsledku v původním jazyce
Partially observable Markov decision processes (POMDPs) are the standard models for planning under uncertainty with both finite and infinite horizon. Besides the well-known discounted-sum objective, indefinite-horizon objective (aka Goal-POMDPs) is another classical objective for POMDPs. In this case, given a set of target states and a positive cost for each transition, the optimization objective is to minimize the expected total cost until a target state is reached. In the literature, RTDP-Bel or heuristic search value iteration (HSVI) have been used for solving Goal-POMDPs. Neither of these algorithms has theoretical convergence guarantees, and HSVI may even fail to terminate its trials. We give the following contributions: (1) We discuss the challenges introduced in Goal-POMDPs and illustrate how they prevent the original HSVI from converging. (2) We present a novel algorithm inspired by HSVI, termed Goal-HSVI, and show that our algorithm has convergence guarantees. (3) We show that Goal-HSVI outperforms RTDP-Bel on a set of well-known examples.
Název v anglickém jazyce
Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs
Popis výsledku anglicky
Partially observable Markov decision processes (POMDPs) are the standard models for planning under uncertainty with both finite and infinite horizon. Besides the well-known discounted-sum objective, indefinite-horizon objective (aka Goal-POMDPs) is another classical objective for POMDPs. In this case, given a set of target states and a positive cost for each transition, the optimization objective is to minimize the expected total cost until a target state is reached. In the literature, RTDP-Bel or heuristic search value iteration (HSVI) have been used for solving Goal-POMDPs. Neither of these algorithms has theoretical convergence guarantees, and HSVI may even fail to terminate its trials. We give the following contributions: (1) We discuss the challenges introduced in Goal-POMDPs and illustrate how they prevent the original HSVI from converging. (2) We present a novel algorithm inspired by HSVI, termed Goal-HSVI, and show that our algorithm has convergence guarantees. (3) We show that Goal-HSVI outperforms RTDP-Bel on a set of well-known examples.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
V - Vyzkumna aktivita podporovana z jinych verejnych zdroju
Ostatní
Rok uplatnění
2018
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Proceedings of the International Joint Conferences on Artifical Intelligence
ISBN
978-0-9992411-2-7
ISSN
—
e-ISSN
1045-0823
Počet stran výsledku
7
Strana od-do
4764-4770
Název nakladatele
International Joint Conferences on Artificial Intelligence Organization
Místo vydání
—
Místo konání akce
Stockholm
Datum konání akce
13. 7. 2018
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—