WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Notes: Temporal-Difference Learning

Posted on October 12, 2017 0 Comments

Learning Rules

TD Prediction

The simplest temporal-difference method $TD(0)$ is

where $R_{t+1}+\gamma V(S_{t+1})$ is the TD target.

On the one hand, the target is an estimate like MC estimate because it samples the expected value; on the other hand, it is also like DP target because it uses the current estimate of $V$ instead of $v_\pi$ –bootstrap.


Published in categories Reinforcement Learning