词条 | Model-free (reinforcement learning) |
释义 |
In reinforcement learning (RL), a model-free algorithm (as opposed to a model-based one) is an algorithm which does not use the transition probability distribution (and the reward function) associated with the Markov decision process (MDP) [1], which, in RL, represents the problem to be solved. The transition probability distribution (or transition model) and the reward function are often collectively called the "model" of the environment (or MDP), hence the name "model-free". A model-free RL algorithm can be thought of as an "explicit" trial-and-error algorithm [1]. An example of a model-free algorithm is Q-learning. References1. ^1 {{cite book |last1=Sutton |first1=Richard S. |last2=Barto |first2=Andrew G. |title=Reinforcement Learning: An Introduction|date=November 13, 2018 |publisher=A Bradford Book |isbn=0262039249 |pages=552 |edition=Second |url=http://incompleteideas.net/book/bookdraft2018mar21.pdf |accessdate=18 February 2019}} 1 : Machine learning |
随便看 |
|
开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。