Sample Based Policy Evaluation | Cs188AI Wiki | Fandom

Advertisement

General Info[]

What[]

An example of model free learning, where we take samples from the environment, and we update our values based on those samples

How[]

Take samples from the environment as we progress through it based on a policy $\pi$
Average those samples together to get a better estimate of the value at states

Mathematical Definition[]

The equation for taking a sample is defined below:

$sample_1 = R(s, \pi(s), s_1') + \gamma V^\pi _k(s_1')$

$sample_n = R(s, \pi(s), s_n') + \gamma V^\pi _k(s_n')$

The equation for the state value update is

$V^\pi _{k+1}(s) \leftarrow \frac {1}{n}\sum _i sample_i$

Disadvantages[]

This algorithm is not feasible because we do not know when we will be back at state s again

Advertisement

Fan Feed

More Cs188AI Wiki