Reward Redistribution for Reinforcement Learning of Dynamic Nonprehensile Manipulation

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F21%3A00351068" target="_blank" >RIV/68407700:21230/21:00351068 - isvavai.cz</a>
Nalezeny alternativní kódy
RIV/68407700:21460/21:00351068 RIV/68407700:21730/21:00351068
Výsledek na webu
<a href="https://doi.org/10.1109/ICCAR52225.2021.9463495" target="_blank" >https://doi.org/10.1109/ICCAR52225.2021.9463495</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/ICCAR52225.2021.9463495" target="_blank" >10.1109/ICCAR52225.2021.9463495</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Reward Redistribution for Reinforcement Learning of Dynamic Nonprehensile Manipulation
Popis výsledku v původním jazyce
Recent reinforcement learning (RL) systems can solve a wide variety of manipulation tasks even in real-world robotic implementations. However, in some nonprehensile manipulation tasks (e.g. poking, throwing), the classical reward system fails as the robot has to manipulate objects whose motion trajectory is partly uncontrollable. Such tasks require a specific type of reward that would reflect this temporal misalignment. We propose a novel method, based on a delayed reward redistribution, that allows a robot to fulfil goals in an only partially controllable environment. The reward system in our architecture combines information from other sensors together with inputs from an unsupervised vision module based on a variational autoencoder (VAE). This delayed reward system then controls the training of the motor module based on a Soft Actor-Critic (SAC) neural network. We compare results for a delayed and nondelayed version of our system in a simulated environment and show that the delayed reward greatly outperforms the nondelayed version.
Název v anglickém jazyce
Reward Redistribution for Reinforcement Learning of Dynamic Nonprehensile Manipulation
Popis výsledku anglicky
Recent reinforcement learning (RL) systems can solve a wide variety of manipulation tasks even in real-world robotic implementations. However, in some nonprehensile manipulation tasks (e.g. poking, throwing), the classical reward system fails as the robot has to manipulate objects whose motion trajectory is partly uncontrollable. Such tasks require a specific type of reward that would reflect this temporal misalignment. We propose a novel method, based on a delayed reward redistribution, that allows a robot to fulfil goals in an only partially controllable environment. The reward system in our architecture combines information from other sensors together with inputs from an unsupervised vision module based on a variational autoencoder (VAE). This delayed reward system then controls the training of the motor module based on a Soft Actor-Critic (SAC) neural network. We compare results for a delayed and nondelayed version of our system in a simulated environment and show that the delayed reward greatly outperforms the nondelayed version.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
20204 - Robotics and automatic control

Návaznosti výsledku

Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach

Ostatní

Rok uplatnění
2021
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
2021 7th International Conference on Control, Automation and Robotics (ICCAR)
ISBN
978-1-6654-4986-1
ISSN
—
e-ISSN
2251-2454
Počet stran výsledku
6
Strana od-do
326-331
Název nakladatele
IEEE (Institute of Electrical and Electronics Engineers)
Místo vydání
—
Místo konání akce
Singapur (virtuálně)
Datum konání akce
23. 4. 2021
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

Dvojruký manipulátor pro manipulaci s měkkými materiály Visual Navigation in Real-World Indoor Environments Using End-to-End Deep Reinforcement Learning Reward Processing During Monetary Incentive Delay Task After Leptin Substitution in Lipodystrophy-an fMRI Case Series

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Reward Redistribution for Reinforcement Learning of Dynamic Nonprehensile Manipulation

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)