人工智能 - Sutton & Barto 书中的方程 7.3 有什么问题？ - 吾爱随笔录

Sutton Barto书中的公式 7.3 ：

Equation: m a x_{s} | E_{π} [G_{t : t + n} | S_{t} = s] - v_{π} | \leq γ^{n} m a x_{s} | V_{t + n - 1} (s) - v_{π} (s) |

$\text{Equation: } max_s|\mathbb{E}_\pi[G_{t:t+n}|S_t = s] - v_\pi| \le \gamma^nmax_s|V_{t+n-1}(s) - v_\pi(s)|$

where G_{t : t + n} = R_{t + 1} + γ R_{t + 2} + . . . . . + γ^{n - 1} R_{t + n} + γ^{n} V_{t + n - 1} (S_{t + n})

$\text{where }G_{t:t+n} = R_{t+1} + \gamma R_{t+2} + .....+\gamma^{n-1} R_{t+n} + \gamma^nV_{t+n-1}(S_{t+n})$ 这里

V_{t + n - 1} (S_{t + n})

$V_{t+n-1}(S_{t+n})$ 是估计

V_{π} (S_{t + n})

$V_\pi(S_{t+n})$

但是上式的左边应该为零，因为对于任何状态 s， $G_{t:t+n}$ 是一个无偏估计 $v_\pi(s)$ 因此 $\mathbb{E}_\pi[G_{t:t+n}|S_t = s] = v_\pi(s)$ .