All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F16%3APU121655" target="_blank" >RIV/00216305:26230/16:PU121655 - isvavai.cz</a>

  • Result on the web

    <a href="http://link.springer.com/chapter/10.1007%2F978-3-319-46520-3_2" target="_blank" >http://link.springer.com/chapter/10.1007%2F978-3-319-46520-3_2</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1007/978-3-319-46520-3_2" target="_blank" >10.1007/978-3-319-46520-3_2</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

  • Original language description

    We consider the problem of finding an optimal policy in a Markov decision process that maximises the expected discounted sum of rewards over an infinite time horizon. Since the explicit iterative dynamical programming scheme does not scale when increasing the dimension of the state space, a number of approximate methods have been developed. These are typically based on value or policy iteration, enabling further speedups through lumped and distributed updates, or by employing succinct representations of the value functions. However, none of the existing approximate techniques provides general, explicit and tunable bounds on the approximation error, a problem particularly relevant when the level of accuracy affects the optimality of the policy. In this paper we propose a new approximate policy iteration scheme that mitigates the state-space explosion problem by adaptive state-space aggregation, at the same time providing rigorous and explicit error bounds that can be used to control the optimality level of the obtained policy. We evaluate the new approach on a case study, demonstrating evidence that the state-space reduction results in considerable acceleration of the policy iteration scheme, while being able to meet the required level of precision.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

  • OECD FORD branch

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

  • Project

    <a href="/en/project/GA16-17538S" target="_blank" >GA16-17538S: Relaxed equivalence checking for approximate computing</a><br>

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

  • Publication year

    2016

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    Proceedings of 14th International Symposium on Automated Technology for Verification and Analysis

  • ISBN

    978-3-319-46519-7

  • ISSN

  • e-ISSN

  • Number of pages

    16

  • Pages from-to

    13-31

  • Publisher name

    Springer Verlag

  • Place of publication

    Heidelberg

  • Event location

    Chiba

  • Event date

    Oct 17, 2016

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article

    000389808100002