RL-Course-3
Note three of RL course from Alberta University.
Parameterized funcitions to approximate values
Bring some parameters into the value function with the state,
\(V(s, w)\) Where w is the parameter vector. We can learn the w.
The weights can be changes so the value function can be changed.
Linear value function approximation
\[v(s, w) = \sum W_i x_i(s)\]Where $x_i$ is the feature vector. The features are the basis functions.
Generalization and Discrimination
Generalization: updates to one state affect the value of other states.
Discrimination: the ability to make the value of two states different.
Framing value estimation as supervised learning
The function approximator should be compatible with online updates.
Semi-Gradient TD for Policy Evaluation
- TD update for function approximation
The gradient Monte Carlo update equation:
\[w \leftarrow w + \alpha (G - V(s, w)) \nabla_w V(s, w)\]if we replace the return with other estimates, we can get the TD update.
\[w \leftarrow w + \alpha (U_t - V(s, w)) \nabla_w V(s, w)\]Where $U_t$ is the TD target. if the target is unbiased, then $w$ will converge to a local optimum.