Cs188AI Wiki

General Info[]


Model based learning suggests that we try to learn the model of the problem, trying to figure out the reward and transitions, and then use the methods (value iteration or policy iteration) we learned to solve MDPs to try to get optimal policies


Learn some empirical MDP model

  1. Count each s' that occurs from state action pair (s, a)
  2. We can approximate the transition function as the number of occurences of s' over the total number of samples we've taken

Run value iteration


  • We can only know the model at points that we've already tried