Look-ahead Search on Top of Policy Networks in Imperfect Information Games
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F24%3A00377053" target="_blank" >RIV/68407700:21230/24:00377053 - isvavai.cz</a>
Result on the web
<a href="https://doi.org/10.24963/ijcai.2024/480" target="_blank" >https://doi.org/10.24963/ijcai.2024/480</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.24963/ijcai.2024/480" target="_blank" >10.24963/ijcai.2024/480</a>
Alternative languages
Result language
angličtina
Original language name
Look-ahead Search on Top of Policy Networks in Imperfect Information Games
Original language description
Search in test time is often used to improve the performance of reinforcement learning algorithms. Performing theoretically sound search in fully adversarial two-player games with imperfect information is notoriously difficult and requires a complicated training process. We present a method for adding test-time search to an arbitrary policy-gradient algorithm that learns from sampled trajectories. Besides the policy network, the algorithm trains an additional critic network, which estimates the expected values of players following various transformations of the policies given by the policy network. These values are then used for depth-limited search. We show how the values from this critic can create a value function for imperfect information games. Moreover, they can be used to compute the summary statistics necessary to start the search from an arbitrary decision point in the game. The presented algorithm is scalable to very large games since it does not require any search during train time. We evaluate the algorithm's performance when trained along Regularized Nash Dynamics, and we evaluate the benefit of using the search in the standard benchmark game of Leduc hold'em, multiple variants of imperfect information Goofspiel, and Battleships.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
Result was created during the realization of more than one project. More information in the Projects tab.
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the 33rd International Joint Conference on Artificial Intelligence
ISBN
978-1-956792-04-1
ISSN
1045-0823
e-ISSN
—
Number of pages
9
Pages from-to
4344-4352
Publisher name
International Joint Conferences on Artificial Intelligence Organization
Place of publication
—
Event location
Jeju
Event date
Aug 3, 2024
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
001347142804052