词条 | Decentralized partially observable Markov decision process |
释义 |
}} The decentralized partially observable Markov decision process (Dec-POMDP) [1][2] is a model for coordination and decision-making among multiple agents. It is a probabilistic model that can consider uncertainty in outcomes, sensors and communication (i.e., costly, delayed, noisy or nonexistent communication). it is a generalization of a Markov decision process (MDP) and a partially observable Markov decision process (POMDP) to consider multiple decentralized agents. DefinitionFormal definitionA Dec-POMDP is a 7-tuple , where
At each time step, each agent takes an action , the state updates based on the transition function (using the current state and the joint action), each agent observes an observation based on the observation function (using the next state and the joint action) and a reward is generated for the whole team based on the reward function . The goal is to maximize expected cumulative reward over a finite or infinite number of steps. These time steps repeat until some given horizon (called finite horizon) or forever (called infinite horizon). The discount factor maintains a finite sum in the infinite-horizon case (). References1. ^{{Cite journal|last=Bernstein|first=Daniel S.|last2=Givan|first2=Robert|last3=Immerman|first3=Neil|last4=Zilberstein|first4=Shlomo|date=November 2002|title=The Complexity of Decentralized Control of Markov Decision Processes|journal=Math. Oper. Res.|volume=27|issue=4|pages=819–840|doi=10.1287/moor.27.4.819.297|issn=0364-765X|arxiv=1301.3836}} 2. ^{{Cite book|title=A Concise Introduction to Decentralized POMDPs {{!}} SpringerLink|last=Oliehoek|first=Frans A.|last2=Amato|first2=Christopher|language=en-gb|doi=10.1007/978-3-319-28929-8|series = SpringerBriefs in Intelligent Systems|year = 2016|isbn = 978-3-319-28927-4|url = http://www.fransoliehoek.net/docs/OliehoekAmato16book.pdf}} External links
1 : Markov processes |
随便看 |
|
开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。