Cs188AI Wiki
Advertisement

General Info[]

What[]

An example of model free learning, where we take samples from the environment, and we update our values based on those samples

How[]

  1. Take samples from the environment as we progress through it based on a policy
  2. Average those samples together to get a better estimate of the value at states

Mathematical Definition[]

The equation for taking a sample is defined below:

The equation for the state value update is

Disadvantages[]

  • This algorithm is not feasible because we do not know when we will be back at state s again
Advertisement