All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Reward Redistribution for Reinforcement Learning of Dynamic Nonprehensile Manipulation

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F21%3A00351068" target="_blank" >RIV/68407700:21230/21:00351068 - isvavai.cz</a>

  • Alternative codes found

    RIV/68407700:21460/21:00351068 RIV/68407700:21730/21:00351068

  • Result on the web

    <a href="https://doi.org/10.1109/ICCAR52225.2021.9463495" target="_blank" >https://doi.org/10.1109/ICCAR52225.2021.9463495</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1109/ICCAR52225.2021.9463495" target="_blank" >10.1109/ICCAR52225.2021.9463495</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Reward Redistribution for Reinforcement Learning of Dynamic Nonprehensile Manipulation

  • Original language description

    Recent reinforcement learning (RL) systems can solve a wide variety of manipulation tasks even in real-world robotic implementations. However, in some nonprehensile manipulation tasks (e.g. poking, throwing), the classical reward system fails as the robot has to manipulate objects whose motion trajectory is partly uncontrollable. Such tasks require a specific type of reward that would reflect this temporal misalignment. We propose a novel method, based on a delayed reward redistribution, that allows a robot to fulfil goals in an only partially controllable environment. The reward system in our architecture combines information from other sensors together with inputs from an unsupervised vision module based on a variational autoencoder (VAE). This delayed reward system then controls the training of the motor module based on a Soft Actor-Critic (SAC) neural network. We compare results for a delayed and nondelayed version of our system in a simulated environment and show that the delayed reward greatly outperforms the nondelayed version.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

  • OECD FORD branch

    20204 - Robotics and automatic control

Result continuities

  • Project

  • Continuities

    S - Specificky vyzkum na vysokych skolach

Others

  • Publication year

    2021

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    2021 7th International Conference on Control, Automation and Robotics (ICCAR)

  • ISBN

    978-1-6654-4986-1

  • ISSN

  • e-ISSN

    2251-2454

  • Number of pages

    6

  • Pages from-to

    326-331

  • Publisher name

    IEEE (Institute of Electrical and Electronics Engineers)

  • Place of publication

  • Event location

    Singapur (virtuálně)

  • Event date

    Apr 23, 2021

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article