数据挖掘 - 了解为什么在深度强化学习中数据中的相关性会降低有效性 - 吾爱随笔录

从纸上

通过深度强化学习进行人类水平控制，Mnih 等人。自然 2015

它说

Reinforcement learning is known to be unstable or even to diverge
when a nonlinear function approximator such as a neural network is
used to represent the action-value (also known as Q) function 20 .
This instability has several causes: the correlations present in the
sequence of observations

我不确定如何理解这一点，也无法创建任何可能发生这种情况的假设示例。有哪些假设场景或真实示例，其中序列中存在的相关性会破坏使用“深度学习”逼近器的使用？