Symbolic method for deriving policy in reinforcement learning

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21730%2F16%3A00305722" target="_blank" >RIV/68407700:21730/16:00305722 - isvavai.cz</a>
Výsledek na webu
<a href="http://ieeexplore.ieee.org/document/7798684/" target="_blank" >http://ieeexplore.ieee.org/document/7798684/</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/CDC.2016.7798684" target="_blank" >10.1109/CDC.2016.7798684</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Symbolic method for deriving policy in reinforcement learning
Popis výsledku v původním jazyce
This paper addresses the problem of deriving a policy from the value function in the context of reinforcement learning in continuous state and input spaces. We propose a novel method based on genetic programming to construct a symbolic function, which serves as a proxy to the value function and from which a continuous policy is derived. The symbolic proxy function is constructed such that it maximizes the number of correct choices of the control input for a set of selected states. Maximization methods can then be used to derive a control policy that performs better than the policy derived from the original approximate value function. The method was experimentally evaluated on two control problems with continuous spaces, pendulum swing-up and magnetic manipulation, and compared to a standard policy derivation method using the value function approximation. The results show that the proposed method and its variants outperform the standard method.
Název v anglickém jazyce
Symbolic method for deriving policy in reinforcement learning
Popis výsledku anglicky
This paper addresses the problem of deriving a policy from the value function in the context of reinforcement learning in continuous state and input spaces. We propose a novel method based on genetic programming to construct a symbolic function, which serves as a proxy to the value function and from which a continuous policy is derived. The symbolic proxy function is constructed such that it maximizes the number of correct choices of the control input for a set of selected states. Maximization methods can then be used to derive a control policy that performs better than the policy derived from the original approximate value function. The method was experimentally evaluated on two control problems with continuous spaces, pendulum swing-up and magnetic manipulation, and compared to a standard policy derivation method using the value function approximation. The results show that the proposed method and its variants outperform the standard method.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
JC - Počítačový hardware a software
OECD FORD obor
—

Návaznosti výsledku

Projekt
<a href="/cs/project/GA15-22731S" target="_blank" >GA15-22731S: Symbolická regrese pro posilované učení ve spojitých prostorech</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2016
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Proceedings of the IEEE 55th Conference on Decision and Control (CDC)
ISBN
978-1-5090-1837-6
ISSN
—
e-ISSN
—
Počet stran výsledku
7
Strana od-do
2789-2795
Název nakladatele
IEEE
Místo vydání
Piscataway, NJ
Místo konání akce
Las Vegas
Datum konání akce
12. 12. 2016
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

Proxy Functions for Approximate Reinforcement Learning Policy derivation methods for critic-only reinforcement learning in continuous spaces Optimal Control via Reinforcement Learning with Symbolic Policy Approximation

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Symbolic method for deriving policy in reinforcement learning

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)