Symbolic method for deriving policy in reinforcement learning
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21730%2F16%3A00305722" target="_blank" >RIV/68407700:21730/16:00305722 - isvavai.cz</a>
Result on the web
<a href="http://ieeexplore.ieee.org/document/7798684/" target="_blank" >http://ieeexplore.ieee.org/document/7798684/</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/CDC.2016.7798684" target="_blank" >10.1109/CDC.2016.7798684</a>
Alternative languages
Result language
angličtina
Original language name
Symbolic method for deriving policy in reinforcement learning
Original language description
This paper addresses the problem of deriving a policy from the value function in the context of reinforcement learning in continuous state and input spaces. We propose a novel method based on genetic programming to construct a symbolic function, which serves as a proxy to the value function and from which a continuous policy is derived. The symbolic proxy function is constructed such that it maximizes the number of correct choices of the control input for a set of selected states. Maximization methods can then be used to derive a control policy that performs better than the policy derived from the original approximate value function. The method was experimentally evaluated on two control problems with continuous spaces, pendulum swing-up and magnetic manipulation, and compared to a standard policy derivation method using the value function approximation. The results show that the proposed method and its variants outperform the standard method.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
JC - Computer hardware and software
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/GA15-22731S" target="_blank" >GA15-22731S: Symbolic Regression for Reinforcement Learning in Continuous Spaces</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2016
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the IEEE 55th Conference on Decision and Control (CDC)
ISBN
978-1-5090-1837-6
ISSN
—
e-ISSN
—
Number of pages
7
Pages from-to
2789-2795
Publisher name
IEEE
Place of publication
Piscataway, NJ
Event location
Las Vegas
Event date
Dec 12, 2016
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—