Policy derivation methods for critic-only reinforcement learning in continuous spaces
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21730%2F18%3A00316441" target="_blank" >RIV/68407700:21730/18:00316441 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.sciencedirect.com/science/article/pii/S0952197617302993" target="_blank" >https://www.sciencedirect.com/science/article/pii/S0952197617302993</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.engappai.2017.12.004" target="_blank" >10.1016/j.engappai.2017.12.004</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Policy derivation methods for critic-only reinforcement learning in continuous spaces
Popis výsledku v původním jazyce
This paper addresses the problem of deriving a policy from the value function in the context of critic-only reinforcement learning (RL) in continuous state and action spaces. With continuous-valued states, RL algorithms have to rely on a numerical approximator to represent the value function. Numerical approximation due to its nature virtually always exhibits artifacts which damage the overall performance of the controlled system. In addition, when continuous-valued action is used, the most common approach is to discretize the action space and exhaustively search for the action that maximizes the right-hand side of the Bellman equation. Such a policy derivation procedure is computationally involved and results in steady-state error due to the lack of continuity. In this work, we propose policy derivation methods which alleviate the above problems by means of action space refinement, continuous approximation, and post-processing of the V-function by using symbolic regression. The proposed methods are tested on nonlinear control problems: 1-DOF and 2-DOF pendulum swing-up problems, and on magnetic manipulation. The results show significantly improved performance in terms of cumulative return and computational complexity.
Název v anglickém jazyce
Policy derivation methods for critic-only reinforcement learning in continuous spaces
Popis výsledku anglicky
This paper addresses the problem of deriving a policy from the value function in the context of critic-only reinforcement learning (RL) in continuous state and action spaces. With continuous-valued states, RL algorithms have to rely on a numerical approximator to represent the value function. Numerical approximation due to its nature virtually always exhibits artifacts which damage the overall performance of the controlled system. In addition, when continuous-valued action is used, the most common approach is to discretize the action space and exhaustively search for the action that maximizes the right-hand side of the Bellman equation. Such a policy derivation procedure is computationally involved and results in steady-state error due to the lack of continuity. In this work, we propose policy derivation methods which alleviate the above problems by means of action space refinement, continuous approximation, and post-processing of the V-function by using symbolic regression. The proposed methods are tested on nonlinear control problems: 1-DOF and 2-DOF pendulum swing-up problems, and on magnetic manipulation. The results show significantly improved performance in terms of cumulative return and computational complexity.
Klasifikace
Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
20205 - Automation and control systems
Návaznosti výsledku
Projekt
<a href="/cs/project/GA15-22731S" target="_blank" >GA15-22731S: Symbolická regrese pro posilované učení ve spojitých prostorech</a><br>
Návaznosti
S - Specificky vyzkum na vysokych skolach
Ostatní
Rok uplatnění
2018
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
Engineering Applications of Artificial Intelligence
ISSN
0952-1976
e-ISSN
1873-6769
Svazek periodika
69
Číslo periodika v rámci svazku
march
Stát vydavatele periodika
NL - Nizozemsko
Počet stran výsledku
10
Strana od-do
178-187
Kód UT WoS článku
000424720500015
EID výsledku v databázi Scopus
2-s2.0-85044849467