Safe Autonomous Reinforcement Learning -- {PhD} Thesis Proposal
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F16%3A00301602" target="_blank" >RIV/68407700:21230/16:00301602 - isvavai.cz</a>
Výsledek na webu
<a href="http://cmp.felk.cvut.cz/pub/cmp/articles/pecka/Pecka-TR-2016-03.pdf" target="_blank" >http://cmp.felk.cvut.cz/pub/cmp/articles/pecka/Pecka-TR-2016-03.pdf</a>
DOI - Digital Object Identifier
—
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Safe Autonomous Reinforcement Learning -- {PhD} Thesis Proposal
Popis výsledku v původním jazyce
In the thesis we propose, we focus on equipping existing Reinforcement Learning algorithms with different kinds of safety constraints imposed on the exploration scheme. Common Reinforcement Learning algorithms are (sometimes implicitly) assumed to work in an ergodic, or even "restartable" environment. However, these conditions are not achievable in field robotics, where the expensive robots cannot simply be replaced by a new functioning unit when they perform a "deadly" action. Even so, Reinforcement Learning offers many advantages over supervised learning that are useful in the robotics domain. It may reduce the amount of annotated training data needed to train a task, or, for example, eliminate the need of acquiring a model of the whole system. Thus we note there is a need for something that would allow for using Reinforcement Learning safely in non-ergodic and dangerous environments. Defining and recognizing safe and unsafe tates/actions is a difficult task itself. Even when there is a safety classifier, it still remains to incorporate the safety measures into the Reinforcement Learning process so that efficiency and convergence of the algorithm is not lost. The proposed thesis deals both with safety-classifier creation and the usage of Reinforcement Learning and safety measures together. The available safe exploration methods range from simple algorithms for simple environments to sophisticated methods based on previous experience, state prediction or machine learning. Pitifully, the methods suitable for our field robotics case usually require a precise model of the system, which is however very difficult (or even impossible) to obtain from sensory input in unknown environment. In our previous work, for the safety classifier we proposed a machine learning approach utilizing a cautious simulator. For the connection of Reinforcement Learning and safety we further examine a modified Gradient Policy Search algorithm. ...
Název v anglickém jazyce
Safe Autonomous Reinforcement Learning -- {PhD} Thesis Proposal
Popis výsledku anglicky
In the thesis we propose, we focus on equipping existing Reinforcement Learning algorithms with different kinds of safety constraints imposed on the exploration scheme. Common Reinforcement Learning algorithms are (sometimes implicitly) assumed to work in an ergodic, or even "restartable" environment. However, these conditions are not achievable in field robotics, where the expensive robots cannot simply be replaced by a new functioning unit when they perform a "deadly" action. Even so, Reinforcement Learning offers many advantages over supervised learning that are useful in the robotics domain. It may reduce the amount of annotated training data needed to train a task, or, for example, eliminate the need of acquiring a model of the whole system. Thus we note there is a need for something that would allow for using Reinforcement Learning safely in non-ergodic and dangerous environments. Defining and recognizing safe and unsafe tates/actions is a difficult task itself. Even when there is a safety classifier, it still remains to incorporate the safety measures into the Reinforcement Learning process so that efficiency and convergence of the algorithm is not lost. The proposed thesis deals both with safety-classifier creation and the usage of Reinforcement Learning and safety measures together. The available safe exploration methods range from simple algorithms for simple environments to sophisticated methods based on previous experience, state prediction or machine learning. Pitifully, the methods suitable for our field robotics case usually require a precise model of the system, which is however very difficult (or even impossible) to obtain from sensory input in unknown environment. In our previous work, for the safety classifier we proposed a machine learning approach utilizing a cautious simulator. For the connection of Reinforcement Learning and safety we further examine a modified Gradient Policy Search algorithm. ...
Klasifikace
Druh
V<sub>souhrn</sub> - Souhrnná výzkumná zpráva
CEP obor
JD - Využití počítačů, robotika a její aplikace
OECD FORD obor
—
Návaznosti výsledku
Projekt
<a href="/cs/project/GA14-13876S" target="_blank" >GA14-13876S: Strojové vnímání pro dlouhodobou autonomii mobilních robotů</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2016
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Počet stran výsledku
55
Místo vydání
Praha
Název nakladatele resp. objednatele
Center for Machine Perception, K13133 FEE Czech Technical University
Verze
—