Reward Redistribution for Reinforcement Learning of Dynamic Nonprehensile Manipulation
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F21%3A00351068" target="_blank" >RIV/68407700:21230/21:00351068 - isvavai.cz</a>
Nalezeny alternativní kódy
RIV/68407700:21460/21:00351068 RIV/68407700:21730/21:00351068
Výsledek na webu
<a href="https://doi.org/10.1109/ICCAR52225.2021.9463495" target="_blank" >https://doi.org/10.1109/ICCAR52225.2021.9463495</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/ICCAR52225.2021.9463495" target="_blank" >10.1109/ICCAR52225.2021.9463495</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Reward Redistribution for Reinforcement Learning of Dynamic Nonprehensile Manipulation
Popis výsledku v původním jazyce
Recent reinforcement learning (RL) systems can solve a wide variety of manipulation tasks even in real-world robotic implementations. However, in some nonprehensile manipulation tasks (e.g. poking, throwing), the classical reward system fails as the robot has to manipulate objects whose motion trajectory is partly uncontrollable. Such tasks require a specific type of reward that would reflect this temporal misalignment. We propose a novel method, based on a delayed reward redistribution, that allows a robot to fulfil goals in an only partially controllable environment. The reward system in our architecture combines information from other sensors together with inputs from an unsupervised vision module based on a variational autoencoder (VAE). This delayed reward system then controls the training of the motor module based on a Soft Actor-Critic (SAC) neural network. We compare results for a delayed and nondelayed version of our system in a simulated environment and show that the delayed reward greatly outperforms the nondelayed version.
Název v anglickém jazyce
Reward Redistribution for Reinforcement Learning of Dynamic Nonprehensile Manipulation
Popis výsledku anglicky
Recent reinforcement learning (RL) systems can solve a wide variety of manipulation tasks even in real-world robotic implementations. However, in some nonprehensile manipulation tasks (e.g. poking, throwing), the classical reward system fails as the robot has to manipulate objects whose motion trajectory is partly uncontrollable. Such tasks require a specific type of reward that would reflect this temporal misalignment. We propose a novel method, based on a delayed reward redistribution, that allows a robot to fulfil goals in an only partially controllable environment. The reward system in our architecture combines information from other sensors together with inputs from an unsupervised vision module based on a variational autoencoder (VAE). This delayed reward system then controls the training of the motor module based on a Soft Actor-Critic (SAC) neural network. We compare results for a delayed and nondelayed version of our system in a simulated environment and show that the delayed reward greatly outperforms the nondelayed version.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
20204 - Robotics and automatic control
Návaznosti výsledku
Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach
Ostatní
Rok uplatnění
2021
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
2021 7th International Conference on Control, Automation and Robotics (ICCAR)
ISBN
978-1-6654-4986-1
ISSN
—
e-ISSN
2251-2454
Počet stran výsledku
6
Strana od-do
326-331
Název nakladatele
IEEE (Institute of Electrical and Electronics Engineers)
Místo vydání
—
Místo konání akce
Singapur (virtuálně)
Datum konání akce
23. 4. 2021
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—