Policy derivation methods for critic-only reinforcement learning in continuous spaces

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21730%2F18%3A00316441" target="_blank" >RIV/68407700:21730/18:00316441 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.sciencedirect.com/science/article/pii/S0952197617302993" target="_blank" >https://www.sciencedirect.com/science/article/pii/S0952197617302993</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.engappai.2017.12.004" target="_blank" >10.1016/j.engappai.2017.12.004</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Policy derivation methods for critic-only reinforcement learning in continuous spaces
Popis výsledku v původním jazyce
This paper addresses the problem of deriving a policy from the value function in the context of critic-only reinforcement learning (RL) in continuous state and action spaces. With continuous-valued states, RL algorithms have to rely on a numerical approximator to represent the value function. Numerical approximation due to its nature virtually always exhibits artifacts which damage the overall performance of the controlled system. In addition, when continuous-valued action is used, the most common approach is to discretize the action space and exhaustively search for the action that maximizes the right-hand side of the Bellman equation. Such a policy derivation procedure is computationally involved and results in steady-state error due to the lack of continuity. In this work, we propose policy derivation methods which alleviate the above problems by means of action space refinement, continuous approximation, and post-processing of the V-function by using symbolic regression. The proposed methods are tested on nonlinear control problems: 1-DOF and 2-DOF pendulum swing-up problems, and on magnetic manipulation. The results show significantly improved performance in terms of cumulative return and computational complexity.
Název v anglickém jazyce
Policy derivation methods for critic-only reinforcement learning in continuous spaces
Popis výsledku anglicky
This paper addresses the problem of deriving a policy from the value function in the context of critic-only reinforcement learning (RL) in continuous state and action spaces. With continuous-valued states, RL algorithms have to rely on a numerical approximator to represent the value function. Numerical approximation due to its nature virtually always exhibits artifacts which damage the overall performance of the controlled system. In addition, when continuous-valued action is used, the most common approach is to discretize the action space and exhaustively search for the action that maximizes the right-hand side of the Bellman equation. Such a policy derivation procedure is computationally involved and results in steady-state error due to the lack of continuity. In this work, we propose policy derivation methods which alleviate the above problems by means of action space refinement, continuous approximation, and post-processing of the V-function by using symbolic regression. The proposed methods are tested on nonlinear control problems: 1-DOF and 2-DOF pendulum swing-up problems, and on magnetic manipulation. The results show significantly improved performance in terms of cumulative return and computational complexity.

Klasifikace

Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
20205 - Automation and control systems

Návaznosti výsledku

Projekt
<a href="/cs/project/GA15-22731S" target="_blank" >GA15-22731S: Symbolická regrese pro posilované učení ve spojitých prostorech</a><br>
Návaznosti
S - Specificky vyzkum na vysokych skolach

Ostatní

Rok uplatnění
2018
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
Engineering Applications of Artificial Intelligence
ISSN
0952-1976
e-ISSN
1873-6769
Svazek periodika
69
Číslo periodika v rámci svazku
march
Stát vydavatele periodika
NL - Nizozemsko
Počet stran výsledku
10
Strana od-do
178-187
Kód UT WoS článku
000424720500015
EID výsledku v databázi Scopus
2-s2.0-85044849467

Podobné výsledky(10)

Optimal Control via Reinforcement Learning with Symbolic Policy Approximation Proxy Functions for Approximate Reinforcement Learning Symbolic method for deriving policy in reinforcement learning

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Policy derivation methods for critic-only reinforcement learning in continuous spaces

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)