数据挖掘 - 剖析和理解 Adam 优化的公式 - 吾爱随笔录

Adam 的优化具有以下参数更新规则：

θ_{t + 1} = θ_{t} - α * \frac{m_{t}}{\sqrt{v_{t} + ϵ}}

$\theta_{t+1} = \theta_{t} - \alpha*\dfrac{m_t}{\sqrt{v_t + \epsilon}}$ 在哪里

m_{t} is first moment of gradients and v_{t} is second moment of gradient

$m_t \text{ is first moment of gradients and} \space v_t \space \text{is second moment of gradient}$

关于上述公式，我有以下问题：

在来这里之前，我在网上查阅并阅读了各种文章，因为没有一篇文章有助于提供直觉。我也尝试阅读原始论文，但我发现很难理解。