Markov Decision Process Search | Cs188AI Wiki | Fandom

Advertisement

General Info[]

Mdp search tree

What[]

Markov decision process search is an example of non deterministic search, and finds the expected value of any given state.

How[]

Compute regular expectimax search, except we introduce a new idea of chance or (q nodes). They represent when we have committed to an action, but are still unsure of which final state s' we end up in

Mathematical Definitions[]

The following mathematical definitions are the same as the expectimax computation. They are also known as the Bellman Equations.

$V^\star(s) = max _a Q^\star (s, a)$

$Q^\star (s,a) = \sum _{s'} T(s,a,s')[R(s,a,s') + \gamma V^\star(s')]$

$V^\star(s) = max _a \sum _{s'} T(s,a,s')[R(s,a,s') + \gamma V^\star(s')]$

Advertisement

Fan Feed

More Cs188AI Wiki