从纸上
通过深度强化学习进行人类水平控制,Mnih 等人。自然 2015
它说
Reinforcement learning is known to be unstable or even to diverge
when a nonlinear function approximator such as a neural network is
used to represent the action-value (also known as Q) function 20 .
This instability has several causes: the correlations present in the
sequence of observations
我不确定如何理解这一点,也无法创建任何可能发生这种情况的假设示例。有哪些假设场景或真实示例,其中序列中存在的相关性会破坏使用“深度学习”逼近器的使用?