All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Symbolic Regression Methods for Reinforcement Learning

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F21%3A00353216" target="_blank" >RIV/68407700:21230/21:00353216 - isvavai.cz</a>

  • Alternative codes found

    RIV/68407700:21730/21:00353216

  • Result on the web

    <a href="https://doi.org/10.1109/ACCESS.2021.3119000" target="_blank" >https://doi.org/10.1109/ACCESS.2021.3119000</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1109/ACCESS.2021.3119000" target="_blank" >10.1109/ACCESS.2021.3119000</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Symbolic Regression Methods for Reinforcement Learning

  • Original language description

    Reinforcement learning algorithms can solve dynamic decision-making and optimal control problems. With continuous-valued state and input variables, reinforcement learning algorithms must rely on function approximators to represent the value function and policy mappings. Commonly used numerical approximators, such as neural networks or basis function expansions, have two main drawbacks: they are black-box models offering little insight into the mappings learned, and they require extensive trial and error tuning of their hyper-parameters. In this paper, we propose a new approach to constructing smooth value functions in the form of analytic expressions by using symbolic regression. We introduce three off-line methods for finding value functions based on a state-transition model: symbolic value iteration, symbolic policy iteration, and a direct solution of the Bellman equation. The methods are illustrated on four nonlinear control problems: velocity control under friction, one-link and two-link pendulum swing-up, and magnetic manipulation. The results show that the value functions yield well-performing policies and are compact, mathematically tractable, and easy to plug into other algorithms. This makes them potentially suitable for further analysis of the closed-loop system. A comparison with an alternative approach using neural networks shows that our method outperforms the neural network-based one.

  • Czech name

  • Czech description

Classification

  • Type

    J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database

  • CEP classification

  • OECD FORD branch

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

  • Project

    Result was created during the realization of more than one project. More information in the Projects tab.

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

  • Publication year

    2021

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Name of the periodical

    IEEE Access

  • ISSN

    2169-3536

  • e-ISSN

    2169-3536

  • Volume of the periodical

    9

  • Issue of the periodical within the volume

    October

  • Country of publishing house

    US - UNITED STATES

  • Number of pages

    15

  • Pages from-to

    139697-139711

  • UT code for WoS article

    000709067500001

  • EID of the result in the Scopus database

    2-s2.0-85117079506