Air Spring Controlled by Reinforcement Learning Algorithm
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24210%2F20%3A00008299" target="_blank" >RIV/46747885:24210/20:00008299 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.engmech.cz/im/im/page/proc" target="_blank" >https://www.engmech.cz/im/im/page/proc</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.21495/5896-3-428" target="_blank" >10.21495/5896-3-428</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Air Spring Controlled by Reinforcement Learning Algorithm
Popis výsledku v původním jazyce
The paper deals with the replacement of the analog PID stroke controller of a bellows pneumatic spring, by machine learning algorithms, specifically deep reinforcement learning. The Deep Deterministic Policy Gradient (DDPG) algorithm used consists of an environment, in this case a pneumatic spring, and an agent which, based on observations of environment, performs actions that lead to the cumulative reward it seeks to maximize. DDPG falls into the category of actor-critic algorithms. It combines the benefits of Q-learning and optimization of a deterministic strategy. Q-learning is represented here in the form of critic, while optimization of strategy is represented in the form of an actor that directly maps the state of the environment to actions. Both the critic and the actor are represented in deep reinforcement learning by deep neural networks. Both of these networks have a target variant of themselves. These target networks are designed to increase the stability and speed of the learning process. The DDPG algorithm also uses a replay buffer, from which the data from which the agent learns is taken in batches.
Název v anglickém jazyce
Air Spring Controlled by Reinforcement Learning Algorithm
Popis výsledku anglicky
The paper deals with the replacement of the analog PID stroke controller of a bellows pneumatic spring, by machine learning algorithms, specifically deep reinforcement learning. The Deep Deterministic Policy Gradient (DDPG) algorithm used consists of an environment, in this case a pneumatic spring, and an agent which, based on observations of environment, performs actions that lead to the cumulative reward it seeks to maximize. DDPG falls into the category of actor-critic algorithms. It combines the benefits of Q-learning and optimization of a deterministic strategy. Q-learning is represented here in the form of critic, while optimization of strategy is represented in the form of an actor that directly maps the state of the environment to actions. Both the critic and the actor are represented in deep reinforcement learning by deep neural networks. Both of these networks have a target variant of themselves. These target networks are designed to increase the stability and speed of the learning process. The DDPG algorithm also uses a replay buffer, from which the data from which the agent learns is taken in batches.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
21100 - Other engineering and technologies
Návaznosti výsledku
Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach
Ostatní
Rok uplatnění
2020
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Engineering Mechanics 2020
ISBN
978-80-214-5896-3
ISSN
1805-8248
e-ISSN
—
Počet stran výsledku
4
Strana od-do
428-431
Název nakladatele
Brno University of Technology Institute of Solid Mechanics, Mechatronics and Biomechanics
Místo vydání
Brno
Místo konání akce
Brno
Datum konání akce
1. 1. 2020
Typ akce podle státní příslušnosti
EUR - Evropská akce
Kód UT WoS článku
000667956100099