Safe Autonomous Reinforcement Learning -- {PhD} Thesis Proposal
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F16%3A00301602" target="_blank" >RIV/68407700:21230/16:00301602 - isvavai.cz</a>
Result on the web
<a href="http://cmp.felk.cvut.cz/pub/cmp/articles/pecka/Pecka-TR-2016-03.pdf" target="_blank" >http://cmp.felk.cvut.cz/pub/cmp/articles/pecka/Pecka-TR-2016-03.pdf</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Safe Autonomous Reinforcement Learning -- {PhD} Thesis Proposal
Original language description
In the thesis we propose, we focus on equipping existing Reinforcement Learning algorithms with different kinds of safety constraints imposed on the exploration scheme. Common Reinforcement Learning algorithms are (sometimes implicitly) assumed to work in an ergodic, or even "restartable" environment. However, these conditions are not achievable in field robotics, where the expensive robots cannot simply be replaced by a new functioning unit when they perform a "deadly" action. Even so, Reinforcement Learning offers many advantages over supervised learning that are useful in the robotics domain. It may reduce the amount of annotated training data needed to train a task, or, for example, eliminate the need of acquiring a model of the whole system. Thus we note there is a need for something that would allow for using Reinforcement Learning safely in non-ergodic and dangerous environments. Defining and recognizing safe and unsafe tates/actions is a difficult task itself. Even when there is a safety classifier, it still remains to incorporate the safety measures into the Reinforcement Learning process so that efficiency and convergence of the algorithm is not lost. The proposed thesis deals both with safety-classifier creation and the usage of Reinforcement Learning and safety measures together. The available safe exploration methods range from simple algorithms for simple environments to sophisticated methods based on previous experience, state prediction or machine learning. Pitifully, the methods suitable for our field robotics case usually require a precise model of the system, which is however very difficult (or even impossible) to obtain from sensory input in unknown environment. In our previous work, for the safety classifier we proposed a machine learning approach utilizing a cautious simulator. For the connection of Reinforcement Learning and safety we further examine a modified Gradient Policy Search algorithm. ...
Czech name
—
Czech description
—
Classification
Type
V<sub>souhrn</sub> - Summary research report
CEP classification
JD - Use of computers, robotics and its application
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/GA14-13876S" target="_blank" >GA14-13876S: Perception methods for long-term autonomy of mobile robots</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2016
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Number of pages
55
Place of publication
Praha
Publisher/client name
Center for Machine Perception, K13133 FEE Czech Technical University
Version
—