Actor-Critic Off-Policy Learning for Optimal Control of Multiple-Model Discrete-Time Systems

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F18%3A43949872" target="_blank" >RIV/49777513:23520/18:43949872 - isvavai.cz</a>
Výsledek na webu
<a href="http://dx.doi.org/10.1109/TCYB.2016.2618926" target="_blank" >http://dx.doi.org/10.1109/TCYB.2016.2618926</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/TCYB.2016.2618926" target="_blank" >10.1109/TCYB.2016.2618926</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Actor-Critic Off-Policy Learning for Optimal Control of Multiple-Model Discrete-Time Systems
Popis výsledku v původním jazyce
In this paper, motivated by human neurocognitive experiments, a model-free off-policy reinforcement learning algorithm is developed to solve the optimal tracking control of multiple-model linear discrete-time systems. First, an adaptive self-organizing map neural network is used to determine the system behavior from measured data and to assign a responsibility signal to each of system possible behaviors. A new model is added if a sudden change of system behavior is detected from the measured data and the behavior has not been previously detected. A value function is represented by partially weighted value functions. Then, the off-policy iteration algorithm is generalized to multiple-model learning to find a solution without any knowledge about the system dynamics or reference trajectory dynamics. The off-policy approach helps to increase data efficiency and speed of tuning since a stream of experiences obtained from executing a behavior policy is reused to update several value functions corresponding to different learning policies sequentially. Two numerical examples serve as a demonstration of the off-policy algorithm performance.
Název v anglickém jazyce
Actor-Critic Off-Policy Learning for Optimal Control of Multiple-Model Discrete-Time Systems
Popis výsledku anglicky
In this paper, motivated by human neurocognitive experiments, a model-free off-policy reinforcement learning algorithm is developed to solve the optimal tracking control of multiple-model linear discrete-time systems. First, an adaptive self-organizing map neural network is used to determine the system behavior from measured data and to assign a responsibility signal to each of system possible behaviors. A new model is added if a sudden change of system behavior is detected from the measured data and the behavior has not been previously detected. A value function is represented by partially weighted value functions. Then, the off-policy iteration algorithm is generalized to multiple-model learning to find a solution without any knowledge about the system dynamics or reference trajectory dynamics. The off-policy approach helps to increase data efficiency and speed of tuning since a stream of experiences obtained from executing a behavior policy is reused to update several value functions corresponding to different learning policies sequentially. Two numerical examples serve as a demonstration of the off-policy algorithm performance.

Klasifikace

Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
20205 - Automation and control systems

Návaznosti výsledku

Projekt
Výsledek vznikl pri realizaci vícero projektů. Více informací v záložce Projekty.
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2018
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
IEEE Transactions on Cybernetics
ISSN
2168-2267
e-ISSN
—
Svazek periodika
48
Číslo periodika v rámci svazku
1
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
12
Strana od-do
29-40
Kód UT WoS článku
000418291400003
EID výsledku v databázi Scopus
2-s2.0-84994252445

Podobné výsledky(10)

Reinforcement Learning with Symbolic Input-Output Models Symbolic Regression Methods for Reinforcement Learning Reinforcement learning for spoken dialogue systems using off-policy natural gradient method

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Actor-Critic Off-Policy Learning for Optimal Control of Multiple-Model Discrete-Time Systems

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)