Correlation minimizing replay memory in temporal-difference reinforcement learning
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F20%3A00339076" target="_blank" >RIV/68407700:21230/20:00339076 - isvavai.cz</a>
Result on the web
<a href="https://doi.org/10.1016/j.neucom.2020.02.004" target="_blank" >https://doi.org/10.1016/j.neucom.2020.02.004</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.neucom.2020.02.004" target="_blank" >10.1016/j.neucom.2020.02.004</a>
Alternative languages
Result language
angličtina
Original language name
Correlation minimizing replay memory in temporal-difference reinforcement learning
Original language description
Online reinforcement learning agents are now able to process an increasing amount of data which makes their approximation and compression into value functions a more demanding task. To improve approximation, thus the learning process itself, it has been proposed to select randomly a mini-batch of the past experiences that are stored in the replay memory buffer to be replayed at each learning step. In this work, we present an algorithm that classifies and samples the experiences into separate contextual memory buffers using an unsupervised learning technique. This allows each new experience to be associated to a mini-batch of the past experiences that are not from the same contextual buffer as the current one, thus further reducing the correlation between experiences. Experimental results show that the correlation minimizing sampling improves over Q-learning algorithms with uniform sampling, and that a significant improvement can be observed when coupled with the sampling methods that prioritize on the experience temporal difference error.
Czech name
—
Czech description
—
Classification
Type
J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2020
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Neurocomputing
ISSN
0925-2312
e-ISSN
1872-8286
Volume of the periodical
393
Issue of the periodical within the volume
June
Country of publishing house
NL - THE KINGDOM OF THE NETHERLANDS
Number of pages
10
Pages from-to
91-100
UT code for WoS article
000531730500010
EID of the result in the Scopus database
2-s2.0-85084116423