All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Correlation minimizing replay memory in temporal-difference reinforcement learning

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F20%3A00339076" target="_blank" >RIV/68407700:21230/20:00339076 - isvavai.cz</a>

  • Result on the web

    <a href="https://doi.org/10.1016/j.neucom.2020.02.004" target="_blank" >https://doi.org/10.1016/j.neucom.2020.02.004</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1016/j.neucom.2020.02.004" target="_blank" >10.1016/j.neucom.2020.02.004</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Correlation minimizing replay memory in temporal-difference reinforcement learning

  • Original language description

    Online reinforcement learning agents are now able to process an increasing amount of data which makes their approximation and compression into value functions a more demanding task. To improve approximation, thus the learning process itself, it has been proposed to select randomly a mini-batch of the past experiences that are stored in the replay memory buffer to be replayed at each learning step. In this work, we present an algorithm that classifies and samples the experiences into separate contextual memory buffers using an unsupervised learning technique. This allows each new experience to be associated to a mini-batch of the past experiences that are not from the same contextual buffer as the current one, thus further reducing the correlation between experiences. Experimental results show that the correlation minimizing sampling improves over Q-learning algorithms with uniform sampling, and that a significant improvement can be observed when coupled with the sampling methods that prioritize on the experience temporal difference error.

  • Czech name

  • Czech description

Classification

  • Type

    J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database

  • CEP classification

  • OECD FORD branch

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

  • Project

  • Continuities

    I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Others

  • Publication year

    2020

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Name of the periodical

    Neurocomputing

  • ISSN

    0925-2312

  • e-ISSN

    1872-8286

  • Volume of the periodical

    393

  • Issue of the periodical within the volume

    June

  • Country of publishing house

    NL - THE KINGDOM OF THE NETHERLANDS

  • Number of pages

    10

  • Pages from-to

    91-100

  • UT code for WoS article

    000531730500010

  • EID of the result in the Scopus database

    2-s2.0-85084116423