General Info[]
What[]
An example of model free learning, where we take samples from the environment, and we update our values based on those samples
How[]
- Take samples from the environment as we progress through it based on a policy
- Average those samples together to get a better estimate of the value at states
Mathematical Definition[]
The equation for taking a sample is defined below:
The equation for the state value update is
Disadvantages[]
- This algorithm is not feasible because we do not know when we will be back at state s again