请输入您要查询的百科知识:

 

词条 Monte Carlo tree search
释义

  1. History

  2. Principle of operation

  3. Pure Monte Carlo game search

  4. Exploration and exploitation

  5. Advantages and disadvantages

  6. Improvements

  7. See also

  8. References

  9. Bibliography

{{Infobox algorithm
|class=Search algorithm
|image=
|data=
|time=
|best-time=
|average-time=
|space=
|optimal=
|complete=
}}{{Tree search algorithm}}

In computer science, Monte Carlo tree search (MCTS) is a heuristic search algorithm for some kinds of decision processes, most notably those employed in game play. MCTS has been used for decades in computer Go programs.[1] It has been used in other board games like chess and shogi,[2] games with incomplete information such as bridge[3] and poker,[4] as well as in real-time video games (such as Rome II's implementation in the high level campaign AI[4]).

History

The Monte Carlo method, which uses randomness for deterministic problems difficult or impossible to solve using other approaches, dates back to the 1940s. Bruce Abramson explored the MCTS idea in his 1987 PhD thesis and said it "is shown to be precise, accurate, easily estimable, efficiently calculable, and domain-independent."[5] He experimented in-depth with Tic-tac-toe and then with machine-generated evaluation functions for Othello and Chess.

Such methods were then explored and successfully applied to heuristic search in the field of automated theorem proving by W. Ertel, J. Schumann and C. Suttner in 1989,[6][7][8] thus improving the exponential search times of uninformed search algorithms such as e.g. breadth-first search, depth-first search or iterative deepening.

In 1992, B. Brügmann employed it for the first time in a Go-playing program.[10] Chang et al.[11] proposed the idea of "recursive rolling out and backtracking" with "adaptive" sampling choices in their Adaptive Multi-stage Sampling (AMS) algorithm for the model of Markov decision processes. (AMS was the first work to explore the idea of UCB-based exploration and exploitation in constructing sampled/simulated (Monte Carlo) trees and was the main seed for UCT.[12])

In 2006, inspired by these predecessors,[10] Rémi Coulom described the application of the Monte Carlo method to game-tree search and coined the name Monte Carlo tree search,[11] L. Kocsis and Cs. Szepesvári developed the UCT algorithm,[16] and S. Gelly et al. implemented UCT in their program MoGo.[17] In 2008, MoGo achieved dan (master) level in 9×9 Go,[12] and the Fuego program began to win with strong amateur players in 9×9 Go.[13]

In January 2012, the Zen program won 3:1 in a Go match on a 19×19 board with an amateur 2 dan player.[14] Google Deepmind developed the program AlphaGo, which in October 2015 became the first Computer Go program to beat a professional human Go player without handicaps on a full-sized 19x19 board.[1][15][16] In March 2016, AlphaGo was awarded an honorary 9-dan (master) level in 19×19 Go for defeating Lee Sedol in a five-game match with a final score of four games to one.[17] AlphaGo represents a significant improvement over previous Go programs as well as a milestone in machine learning as it uses Monte Carlo tree search with artificial neural networks (a deep learning method) for policy (move selection) and value, giving it efficiency far surpassing previous programs.[18]

Monte Carlo tree search has also been used in programs that play other board games (for example Hex,[19] Havannah,[20] Game of the Amazons,[21] and Arimaa[22]), real-time video games (for instance Ms. Pac-Man[23][24] and Fable Legends{{citation needed|date=December 2016}}), and nondeterministic games (such as skat,[25] poker,[26] The Gathering,[27] or Settlers of Catan[28]).

Principle of operation

The focus of Monte Carlo tree search is on the analysis of the most promising moves, expanding the search tree based on random sampling of the search space.

The application of Monte Carlo tree search in games is based on many playouts. In each playout, the game is played out to the very end by selecting moves at random. The final game result of each playout is then used to weight the nodes in the game tree so that better nodes are more likely to be chosen in future playouts.

The most basic way to use playouts is to apply the same number of playouts after each legal move of the current player, then choose the move which led to the most victories.[29] The efficiency of this method—called Pure Monte Carlo Game Search—often increases with time as more playouts are assigned to the moves that have frequently resulted in the current player's victory according to previous playouts. Each round of Monte Carlo tree search consists of four steps:[30]

  • Selection: start from root {{math|R}} and select successive child nodes until a leaf node {{math|L}} is reached. The root is the current game state and a leaf is any node from which no simulation (playout) has yet been initiated. The section below says more about a way of biasing choice of child nodes that lets the game tree expand towards the most promising moves, which is the essence of Monte Carlo tree search.
  • Expansion: unless {{math|L}} ends the game decisively (e.g. win/loss/draw) for either player, create one (or more) child nodes and choose node {{math|C}} from one of them. Child nodes are any valid moves from the game position defined by {{math|L}}.
  • Simulation: complete one random playout from node {{math|C}}. This step is sometimes also called playout or rollout. A playout may be as simple as choosing uniform random moves until the game is decided (for example in chess, the game is won, lost, or drawn).
  • Backpropagation: use the result of the playout to update information in the nodes on the path from {{math|C}} to {{math|R}}.

This graph shows the steps involved in one decision, with each node showing the ratio of wins to total playouts from that point in the game tree for the player that node represents.[31] In the Selection diagram, black is about to move. The root node shows there are 11 wins out of 21 playouts for white from this position so far. It complements the total of 10/21 black wins shown along the three black nodes under it, each of which represents a possible black move.

If white loses the simulation, all nodes along the selection incremented their simulation count (the denominator), but among them only the black nodes were credited with wins (the numerator). If instead white wins, all nodes along the selection would still increment their simulation count, but among them only the white nodes would be credited with wins. In games where draws are possible, a draw causes the numerator for both black and white to be incremented by 0.5 and the denominator by 1. This ensures that during selection, each player's choices expand towards the most promising moves for that player, which mirrors the goal of each player to maximize the value of their move.

Rounds of search are repeated as long as the time allotted to a move remains. Then the move with the most simulations made (i.e. the highest denominator) is chosen as the final answer.

Pure Monte Carlo game search

This basic procedure can be applied to any game whose positions necessarily have a finite number of moves and finite length. For each position, all feasible moves are determined: k random games are played out to the very end, and the scores are recorded. The move leading to the best score is chosen. Ties are broken by fair coin flips. Pure Monte Carlo Game Search results in strong play in several games with random elements, as in the game EinStein würfelt nicht!. It converges to optimal play (as k tends to infinity) in board filling games with random turn order, for instance in Hex with random turn order.[32]

Exploration and exploitation

The main difficulty in selecting child nodes is maintaining some balance between the exploitation of deep variants after moves with high average win rate and the exploration of moves with few simulations. The first formula for balancing exploitation and exploration in games, called UCT (Upper Confidence Bound 1 applied to trees), was introduced by Levente Kocsis and Csaba Szepesvári.[33] UCT is based on the UCB1 formula derived by Auer, Cesa-Bianchi, and Fischer[34] and the provably convergent AMS (Adaptive Multi-stage Sampling) algorithm first applied to multi-stage decision making models (specifically, Markov Decision Processes) by Chang, Fu, Hu, and Marcus.[35] Kocsis and Szepesvári recommend to choose in each node of the game tree the move for which the expression has the highest value. In this formula:

  • {{math|wi}} stands for the number of wins for the node considered after the {{math|i}}-th move
  • {{math|ni}} stands for the number of simulations for the node considered after the {{math|i}}-th move
  • {{math|Ni}} stands for the total number of simulations after the {{math|i}}-th move
  • {{math|c}} is the exploration parameter—theoretically equal to {{math|{{radic|2}}}}; in practice usually chosen empirically

The first component of the formula above corresponds to exploitation; it is high for moves with high average win ratio. The second component corresponds to exploration; it is high for moves with few simulations.

Most contemporary implementations of Monte Carlo tree search are based on some variant of UCT that traces its roots back to the AMS simulation optimization algorithm for estimating the value function in finite-horizon Markov Decision Processes (MDPs) introduced by Chang et al.[35] (2005) in Operations Research. (AMS was the first work to explore the idea of UCB-based exploration and exploitation in constructing sampled/simulated (Monte Carlo) trees and was the main seed for UCT.[36])

Advantages and disadvantages

Although it has been proven that the evaluation of moves in Monte Carlo tree search converges to minimax,[37] the basic version of Monte Carlo tree search converges very slowly. However Monte Carlo tree search does offer significant advantages over alpha–beta pruning and similar algorithms that minimize the search space.

In particular, Monte Carlo tree search does not need an explicit evaluation function. Simply implementing the game's mechanics is sufficient to explore the search space (i.e. the generating of allowed moves in a given position and the game-end conditions). As such, Monte Carlo tree search can be employed in games without a developed theory or in general game playing.

The game tree in Monte Carlo tree search grows asymmetrically as the method concentrates on the more promising subtrees. Thus it achieves better results than classical algorithms in games with a high branching factor.

Moreover, Monte Carlo tree search can be interrupted at any time yielding the most promising move already found.

A disadvantage is that, faced in a game with an expert player, there may be a single branch which leads to a loss. Because this is not easily found at random, the search may not "see" it and will not take it into account. It is believed that this may have been part of the reason for AlphaGo's loss in its fourth game against Lee Sedol. In essence, the search attempts to prune sequences which are less relevant. In some cases, a play can lead to a very specific line of play which is significant, but which is overlooked when the tree is pruned, and this outcome is therefore "off the search radar".[38]

Improvements

Various modifications of the basic Monte Carlo tree search method have been proposed to shorten the search time. Some employ domain-specific expert knowledge, others do not.

Monte Carlo tree search can use either light or heavy playouts. Light playouts consist of random moves while heavy playouts apply various heuristics to influence the choice of moves.[39] These heuristics may employ the results of previous playouts (e.g. the Last Good Reply heuristic[40]) or expert knowledge of a given game. For instance, in many Go-playing programs certain stone patterns in a portion of the board influence the probability of moving into that area.[41] Paradoxically, playing suboptimally in simulations sometimes makes a Monte Carlo tree search program play stronger overall.[42]

Domain-specific knowledge may be employed when building the game tree to help the exploitation of some variants. One such method assigns nonzero priors to the number of won and played simulations when creating each child node, leading to artificially raised or lowered average win rates that cause the node to be chosen more or less frequently, respectively, in the selection step.[43] A related method, called progressive bias, consists in adding to the UCB1 formula a element, where {{math|bi}} is a heuristic score of the {{math|i}}-th move.[30]

The basic Monte Carlo tree search collects enough information to find the most promising moves only after many rounds; until then its moves are essentially random. This exploratory phase may be reduced significantly in a certain class of games using RAVE (Rapid Action Value Estimation).[43] In these games, permutations of a sequence of moves lead to the same position. Typically, they are board games in which a move involves placement of a piece or a stone on the board. In such games the value of each move is often only slightly influenced by other moves.

In RAVE, for a given game tree node {{math|N}}, its child nodes {{math|Ci}} store not only the statistics of wins in playouts started in node {{math|N}} but also the statistics of wins in all playouts started in node {{math|N}} and below it, if they contain move {{math|i}} (also when the move was played in the tree, between node {{math|N}} and a playout). This way the contents of tree nodes are influenced not only by moves played immediately in a given position but also by the same moves played later.

When using RAVE, the selection step selects the node, for which the modified UCB1 formula has the highest value. In this formula, and stand for the number of won playouts containing move {{math|i}} and the number of all playouts containing move {{math|i}}, and the function should be close to one and to zero for relatively small and relatively big {{math|ni}} and , respectively. One of many formulas for , proposed by D. Silver,[44] says that in balanced positions one can take , where {{math|b}} is an empirically chosen constant.

Heuristics used in Monte Carlo tree search often require many parameters. There are automated methods to tune the parameters to maximize the win rate.[45]

Monte Carlo tree search can be concurrently executed by many threads or processes. There are several fundamentally different methods of its parallel execution:[46]

  • Leaf parallelization, i.e. parallel execution of many playouts from one leaf of the game tree.
  • Root parallelization, i.e. building independent game trees in parallel and making the move basing on the root-level branches of all these trees.
  • Tree parallelization, i.e. parallel building of the same game tree, protecting data from simultaneous writes either with one, global mutex, with more mutexes, or with non-blocking synchronization.[47]

See also

  • AlphaGo, a Go program using Monte Carlo tree search, reinforcement learning and deep learning.
  • AlphaGo Zero, an updated Go program using Monte Carlo tree search, reinforcement learning and deep learning.
  • AlphaZero, a generalized version of AlphaGo Zero using Monte Carlo tree search, reinforcement learning and deep learning.
  • Leela Chess Zero, a free software implementation of AlphaZero's methods to chess, which is currently among the leading chess playing programs.

References

1. ^{{Cite journal|title = Mastering the game of Go with deep neural networks and tree search|journal = Nature| issn= 0028-0836|pages = 484–489|volume = 529|issue = 7587|doi = 10.1038/nature16961|pmid = 26819042|first1 = David|last1 = Silver|author-link1=David Silver (programmer)|first2 = Aja|last2 = Huang|author-link2=Aja Huang|first3 = Chris J.|last3 = Maddison|first4 = Arthur|last4 = Guez|first5 = Laurent|last5 = Sifre|first6 = George van den|last6 = Driessche|first7 = Julian|last7 = Schrittwieser|first8 = Ioannis|last8 = Antonoglou|first9 = Veda|last9 = Panneershelvam|first10= Marc|last10= Lanctot|first11= Sander|last11= Dieleman|first12=Dominik|last12= Grewe|first13= John|last13= Nham|first14= Nal|last14= Kalchbrenner|first15= Ilya|last15= Sutskever|author-link15=Ilya Sutskever|first16= Timothy|last16= Lillicrap|first17= Madeleine|last17= Leach|first18= Koray|last18= Kavukcuoglu|first19= Thore|last19= Graepel|first20= Demis |last20=Hassabis|author-link20=Demis Hassabis|date= 28 January 2016|bibcode = 2016Natur.529..484S}}{{closed access}}
2. ^{{cite arXiv |last=Silver|first=David |date=2017 |title=Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm|eprint=1712.01815v1|class=cs.AI }}
3. ^{{cite book |authors=Stuart J. Russell, Peter Norvig |title= Artificial Intelligence: A Modern Approach |edition= 3rd |publisher= Prentice Hall |year= 2009|title-link= Artificial Intelligence: A Modern Approach }}
4. ^{{cite web|title=Monte-Carlo Tree Search in TOTAL WAR: ROME II's Campaign AI|url=http://aigamedev.com/open/coverage/mcts-rome-ii/|website=AI Game Dev|accessdate=25 February 2017}}
5. ^{{cite book|last=Abramson|first=Bruce|title=The Expected-Outcome Model of Two-Player Games|publisher=Technical report, Department of Computer Science, Columbia University|year=1987|url=http://academiccommons.columbia.edu/download/fedora_content/download/ac:142327/CONTENT/CUCS-315-87.pdf|accessdate=23 December 2013}}
6. ^{{cite book|editor1=J. Retti|editor2=K. Leidlmair|title=5. Österreichische Artificial-Intelligence-Tagung. Informatik-Fachberichte 208,pp. 87-95.|chapter=Learning Heuristics for a Theorem Prover using Back Propagation. |author2= Johann Schumann| author3=Christian Suttner |author1= Wolfgang Ertel |publisher=Springer |year=1989|chapter-url=http://www.hs-weingarten.de/~ertel/veroeff_bib.html#ESS89}}
7. ^{{cite book|title=CADE90, 10th Int. Conf. on Automated Deduction.pp. 470-484. LNAI 449.|chapter=Automatic Acquisition of Search Guiding Heuristics. | author1=Christian Suttner |author2= Wolfgang Ertel |publisher=Springer |year=1990|chapter-url=http://www.hs-weingarten.de/~ertel/veroeff_bib.html#ES90:CADE}}
8. ^{{cite journal|author1=Christian Suttner |author2= Wolfgang Ertel |title=Using Back-Propagation Networks for Guiding the Search of a Theorem Prover.|journal=Journal of Neural Networks Research & Applications|volume= 2|issue=1|pages=3–16|date=1991|url=http://www.hs-weingarten.de/~ertel/veroeff_bib.html#ES90:IJNN}}
9. ^{{cite web|url=http://senseis.xmp.net/?KGSBotRatings|title=Sensei's Library: KGSBotRatings|accessdate=2012-05-03}}
10. ^{{cite book|author=Rémi Coulom|chapter=The Monte-Carlo Revolution in Go|title=Japanese-French Frontiers of Science Symposium|year=2008|chapter-url=http://remi.coulom.free.fr/JFFoS/JFFoS.pdf}}
11. ^{{cite book|author=Rémi Coulom|chapter=Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search|pages=72–83|others=H. Jaap van den Herik, Paolo Ciancarini, H. H. L. M. Donkers (eds.)|title=Computers and Games, 5th International Conference, CG 2006, Turin, Italy, May 29–31, 2006. Revised Papers |publisher=Springer|year=2007|isbn=978-3-540-75537-1|doi=|citeseerx=10.1.1.81.6817}}
12. ^{{cite journal|author1=Chang-Shing Lee |author2=Mei-Hui Wang |author3=Guillaume Chaslot |author4=Jean-Baptiste Hoock |author5=Arpad Rimmel |author6=Olivier Teytaud |author7=Shang-Rong Tsai |author8=Shun-Chin Hsu |author9=Tzung-Pei Hong |title=The Computational Intelligence of MoGo Revealed in Taiwan's Computer Go Tournaments|journal=IEEE Transactions on Computational Intelligence and AI in Games|pages=73–89|volume=1|issue=1|year=2009|url=http://hal.inria.fr/docs/00/36/97/86/PDF/TCIAIG-2008-0010_Accepted_.pdf |doi=10.1109/tciaig.2009.2018703|citeseerx=10.1.1.470.6018 }}
13. ^{{cite book|url=http://pug.raph.free.fr/files/Fuego.pdf|title=Fuego – An Open-Source Framework for Board Games and Go Engine Based on Monte Carlo Tree Search|publisher=Technical report, University of Alberta|year=2008|isbn=|location=|pages=|quote=|via=|author1=Markus Enzenberger|author2=Martin Mūller}}
14. ^{{cite web|url=http://dcook.org/gobet/|title=The Shodan Go Bet|accessdate=2012-05-02}}
15. ^{{Cite web|url=http://googleresearch.blogspot.com/2016/01/alphago-mastering-ancient-game-of-go.html|title=Research Blog: AlphaGo: Mastering the ancient game of Go with Machine Learning|last=|first=|date=27 January 2016|website=Google Research Blog|access-date=}}
16. ^{{Cite web|url=https://www.bbc.com/news/technology-35420579|title=Google achieves AI 'breakthrough' by beating Go champion|last=|first=|date=27 January 2016|website=BBC News|access-date=}}
17. ^{{Cite web|url=https://www.youtube.com/watch?v=vFr3K2DORc8&t=1h57m|title=Match 1 - Google DeepMind Challenge Match: Lee Sedol vs AlphaGo|last=|first=|date=9 March 2016|website=Youtube|access-date=}}
18. ^{{Cite web|url=http://www.zdnet.com/article/google-alphago-ai-clean-sweeps-european-go-champion/|title=Google AlphaGo AI clean sweeps European Go champion|last=|first=|date=28 January 2016|website=ZDNet|access-date=}}
19. ^{{cite journal|author1=Broderick Arneson |author2=Ryan Hayward |author3=Philip Henderson |title=MoHex Wins Hex Tournament|journal=ICGA Journal|volume=32|issue=2|pages=114–116|date=June 2009|url=http://webdocs.cs.ualberta.ca/~hayward/papers/rptPamplona.pdf|doi=10.3233/ICG-2009-32218 }}
20. ^{{cite book|author=Timo Ewalds|title=Playing and Solving Havannah|publisher=Master's thesis, University of Alberta|year=2011|url=http://havannah.ewalds.ca/static/thesis.pdf}}
21. ^{{cite book|author=Richard J. Lorentz|chapter=Amazons Discover Monte-Carlo|pages=13–24|others=H. Jaap van den Herik, Xinhe Xu, Zongmin Ma, Mark H. M. Winands (eds.)|title=Computers and Games, 6th International Conference, CG 2008, Beijing, China, September 29 – October 1, 2008. Proceedings|publisher=Springer|year=2008|isbn=978-3-540-87607-6}}
22. ^{{cite book|author=Tomáš Kozelek|title=Methods of MCTS and the game Arimaa|publisher=Master's thesis, Charles University in Prague|year=2009|url=http://arimaa.com/arimaa/papers/TomasKozelekThesis/mt.pdf}}
23. ^{{cite journal|author1=Xiaocong Gan |author2=Yun Bao |author3=Zhangang Han |title=Real-Time Search Method in Nondeterministic Game – Ms. Pac-Man|pages=209–222|journal=ICGA Journal|volume=34|issue=4|date=December 2011|doi=10.3233/ICG-2011-34404 }}
24. ^{{cite journal|author1=Tom Pepels |author2=Mark H. M. Winands |author3=Marc Lanctot |title=Real-Time Monte Carlo Tree Search in Ms Pac-Man|pages=245–257|journal=IEEE Transactions on Computational Intelligence and AI in Games|volume=6|issue=3|date=September 2014 |doi=10.1109/tciaig.2013.2291577}}
25. ^{{cite book|author1=Michael Buro |author2=Jeffrey Richard Long |author3=Timothy Furtak |author4=Nathan R. Sturtevant |chapter=Improving State Evaluation, Inference, and Search in Trick-Based Card Games|pages=1407–1413 |others=Craig Boutilier (ed.)|title=IJCAI 2009, Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, California, USA, July 11–17, 2009 |year=2009 |doi=|citeseerx=10.1.1.150.3077 }}
26. ^{{cite journal|author1=Jonathan Rubin |author2=Ian Watson |title=Computer poker: A review|journal=Artificial Intelligence |volume=175|issue=5–6|date=April 2011|doi=10.1016/j.artint.2010.12.005|url=https://web.archive.org/web/20120813081731/https://www.cs.auckland.ac.nz/~jrub001/files/CPReviewPreprintAIJ.pdf|pages=958–987}}
27. ^{{cite book|author1=C.D. Ward |author2=P.I. Cowling |chapter=Monte Carlo Search Applied to Card Selection in Magic: The Gathering|title=CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games|publisher=IEEE Press |year=2009 |chapter-url=http://scim.brad.ac.uk/staff/pdf/picowlin/CIG2009.pdf |archiveurl=https://web.archive.org/web/20160528074031/http://scim.brad.ac.uk/staff/pdf/picowlin/CIG2009.pdf |archivedate=2016-05-28}}
28. ^{{cite book|author1=István Szita |author2=Guillaume Chaslot |author3=Pieter Spronck |chapter=Monte-Carlo Tree Search in Settlers of Catan |pages=21–32 |editor1=Jaap Van Den Herik |editor2=Pieter Spronck |title=Advances in Computer Games, 12th International Conference, ACG 2009, Pamplona, Spain, May 11–13, 2009. Revised Papers |publisher=Springer |year=2010 |isbn=978-3-642-12992-6 |chapter-url=http://ticc.uvt.nl/icga/acg12/proceedings/Contribution100.pdf}}
29. ^{{cite book|last=Brügmann|first=Bernd|title=Monte Carlo Go|url=http://www.ideanest.com/vegos/MonteCarloGo.pdf|publisher=Technical report, Department of Physics, Syracuse University|year=1993}}
30. ^{{cite journal|author1=G.M.J.B. Chaslot |author2=M.H.M. Winands |author3=J.W.H.M. Uiterwijk |author4=H.J. van den Herik |author5=B. Bouzy |title=Progressive Strategies for Monte-Carlo Tree Search|journal=New Mathematics and Natural Computation|volume=4|issue=3|pages=343–359|year=2008|url=https://dke.maastrichtuniversity.nl/m.winands/documents/pMCTS.pdf|doi=10.1142/s1793005708001094}}
31. ^{{Cite web|url=http://jeffbradberry.com/posts/2015/09/intro-to-monte-carlo-tree-search/|title=Introduction to Monte Carlo Tree Search|last=Bradberry|first=Jeff|date=2015-09-07|website=|access-date=}}
32. ^{{cite arXiv |last1=Peres |first1= Yuval| last2=Schramm| first2=Oded| last3= Sheffield| first3 =Scott | last4 = Wilson | first4=David B. |eprint=math/0508580 |title=Random-Turn Hex and other selection games |date=2006 }}
33. ^{{cite conference |last=Kocsis|first=Levente|last2=Szepesvári|first2=Csaba|title=Bandit based Monte-Carlo Planning|editor-first=Johannes|editor-last=Fürnkranz|editor2-first=Tobias|editor2-last=Scheffer|editor3-first=Myra|editor3-last=Spiliopoulou |booktitle=Machine Learning: ECML 2006, 17th European Conference on Machine Learning, Berlin, Germany, September 18–22, 2006, Proceedings|series=Lecture Notes in Computer Science |volume=4212|publisher=Springer|isbn=3-540-45375-X |pages=282–293|year=2006|doi=10.1007/11871842_29|citeseerx=10.1.1.102.1296}}
34. ^{{cite journal |last=Auer |first=Peter|last2=Cesa-Bianchi|first2=Nicolò|last3=Fischer|first3=Paul|title=Finite-time Analysis of the Multiarmed Bandit Problem|journal=Machine Learning|volume=47|issue=2/3|pages=235–256 |year=2002 |url=http://moodle.technion.ac.il/pluginfile.php/192340/mod_resource/content/0/UCB.pdf|doi=10.1023/a:1013689704352}}{{dead link|date=February 2018 |bot=InternetArchiveBot |fix-attempted=yes }}
35. ^{{cite journal |last=Chang|first=Hyeong Soo |last2=Fu|first2=Michael C.|last3=Hu|first3=Jiaqiao|last4=Marcus|first4=Steven I.|title=An Adaptive Sampling Algorithm for Solving Markov Decision Processes|journal=Operations Research |volume=53|pages=126–139 |year=2005 |url=http://scholar.rhsmith.umd.edu/sites/default/files/mfu/files/cfhm05.pdf?m=1449834091|doi=10.1287/opre.1040.0145}}
36. ^{{cite journal|author1=Hyeong Soo Chang |author2=Michael Fu |author3=Jiaqiao Hu|author4=Steven I. Marcus |title=Google DeepMind's Alphago: O.R.'s unheralded role in the path-breaking achievement|journal=ORMS Today|volume=45|issue=5|pages=24–29|year=2016|url=https://www.informs.org/ORMS-Today/Public-Articles/October-Volume-43-Number-5}}
37. ^{{cite book|last=Bouzy|first=Bruno|chapter=Old-fashioned Computer Go vs Monte-Carlo Go|title=IEEE Symposium on Computational Intelligence and Games, April 1–5, 2007, Hilton Hawaiian Village, Honolulu, Hawaii|chapter-url=http://ewh.ieee.org/cmte/cis/mtsc/ieeecis/tutorial2007/Bruno_Bouzy_2007.pdf}}
38. ^{{cite web|url=https://gogameguru.com/lee-sedol-defeats-alphago-masterful-comeback-game-4/|title=Lee Sedol defeats AlphaGo in masterful comeback - Game 4|publisher=Go Game Guru}}
39. ^Swiechowski, M.; Mandziuk, J., "Self-Adaptation of Playing Strategies in General Game Playing" (2010), IEEE Transactions on Computational Intelligence and AI in Games, doi: 10.1109/TCIAIG.2013.2275163, http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6571225&isnumber=4804729
40. ^{{cite journal|last=Drake|first=Peter|title=The Last-Good-Reply Policy for Monte-Carlo Go|journal=ICGA Journal|volume=32|issue=4|pages=221–227|date=December 2009|doi=10.3233/ICG-2009-32404}}
41. ^{{cite book|author1=Sylvain Gelly |author2=Yizao Wang |author3=Rémi Munos |author4=Olivier Teytaud |title=Modification of UCT with Patterns in Monte-Carlo Go|date=November 2006|publisher=Technical report, INRIA|url=http://hal.inria.fr/docs/00/11/72/66/PDF/MoGoReport.pdf}}
42. ^{{cite book|author1=Seth Pellegrino |author2=Peter Drake |chapter=Investigating the Effects of Playout Strength in Monte-Carlo Go|pages=1015–1018|others=Hamid R. Arabnia, David de la Fuente, Elena B. Kozerenko, José Angel Olivas, Rui Chang, Peter M. LaMonica, Raymond A. Liuzzi, Ashu M. G. Solo (eds.)|title=Proceedings of the 2010 International Conference on Artificial Intelligence, ICAI 2010, July 12–15, 2010, Las Vegas Nevada, USA|publisher=CSREA Press|year=2010|isbn=978-1-60132-148-0}}
43. ^{{cite book|author1=Sylvain Gelly |author2=David Silver |chapter=Combining Online and Offline Knowledge in UCT|pages=273–280|others=Zoubin Ghahramani (ed.)|title=Machine Learning, Proceedings of the Twenty-Fourth International Conference (ICML 2007), Corvallis, Oregon, USA, June 20–24, 2007|publisher=ACM|year=2007|isbn=978-1-59593-793-3|chapter-url=http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf}}
44. ^{{cite book|author=David Silver|title=Reinforcement Learning and Simulation-Based Search in Computer Go|publisher=PhD thesis, University of Alberta |year=2009 |url=http://papersdb.cs.ualberta.ca/~papersdb/uploaded_files/1029/paper_thesis.pdf}}
45. ^{{cite book|author=Rémi Coulom|chapter=CLOP: Confident Local Optimization for Noisy Black-Box Parameter Tuning|title=ACG 2011: Advances in Computer Games 13 Conference, Tilburg, the Netherlands, November 20–22|chapter-url=http://remi.coulom.free.fr/CLOP/}}
46. ^{{cite book|author=Guillaume M.J-B. Chaslot, Mark H.M. Winands, Jaap van den Herik|chapter=Parallel Monte-Carlo Tree Search|pages=60–71|others=H. Jaap van den Herik, Xinhe Xu, Zongmin Ma, Mark H. M. Winands (eds.)|title=Computers and Games, 6th International Conference, CG 2008, Beijing, China, September 29 – October 1, 2008. Proceedings|publisher=Springer|year=2008|isbn=978-3-540-87607-6|chapter-url=https://dke.maastrichtuniversity.nl/m.winands/documents/multithreadedMCTS2.pdf}}
47. ^{{cite book|author1=Markus Enzenberger |author2=Martin Müller |chapter=A Lock-free Multithreaded Monte-Carlo Tree Search Algorithm |pages=14–20 |editor1=Jaap Van Den Herik |editor2=Pieter Spronck |title=Advances in Computer Games: 12th International Conference, ACG 2009, Pamplona, Spain, May 11–13, 2009, Revised Papers |publisher=Springer |year=2010 |chapter-url=http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=02CC0F88A12A3CCE44F0CD139ADA7AF5?doi=10.1.1.161.1984&rep=rep1&type=pdf |isbn=978-3-642-12992-6}}

Bibliography

  • {{cite journal|author1=Cameron Browne |author2=Edward Powley |author3=Daniel Whitehouse |author4=Simon Lucas |author5=Peter I. Cowling |author6=Philipp Rohlfshagen |author7=Stephen Tavener |author8=Diego Perez |author9=Spyridon Samothrakis |author10=Simon Colton |title=A Survey of Monte Carlo Tree Search Methods|journal=IEEE Transactions on Computational Intelligence and AI in Games|volume=4|issue=1|pages=1–43 |date=March 2012|doi=10.1109/tciaig.2012.2186810|citeseerx=10.1.1.297.3086 }}

4 : Combinatorial game theory|Heuristic algorithms|Monte Carlo methods|Optimal decisions

随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/11/11 18:35:38