# Notes: Temporal-Difference Learning

## Learning Rules

## TD Prediction

The simplest temporal-difference method $TD(0)$ is

where $R_{t+1}+\gamma V(S_{t+1})$ is the TD target.

On the one hand, the target is an estimate like MC estimate because it samples the expected value; on the other hand, it is also like DP target because it uses the current estimate of $V$ instead of $v_\pi$ â€“bootstrap.