数据挖掘 - MDP - RL，同一状态的多个奖励可能吗？ - 吾爱随笔录

这个问题来自 An Introduction to RL Pages 48 and 49。这个问题也可能与下面的问题有关，虽然我不确定： Cannot see what the "notation abuse" is, by the author of book

在第 48 页，提到 p:S * R * S * A -> [0,1] 是一个确定性函数：

动力学函数 $p : \mathcal{S} \times \mathcal{R} \times \mathcal{S} \times \mathcal{A} \rightarrow [0, 1]$ 是一个有四个参数的普通确定性函数。

然而，在第 49 页，在等式 3.4 中，对 r 求和：

$\sum_{s^{'} \in S} \sum_{r \in R} p (s^{'}, r | s, a) = 1, for all s \in S, a \in A (s)$ $\sum_{s' \in \mathcal{S}}\sum_{r \in \mathcal{R}} p(s',r|s,a) = 1 ,\text{for all } s \in \mathcal{S}, a \in \mathcal{A}(s)$

我的问题是，这是否意味着，有可能执行一个动作 $a$ 这需要我们声明 $s'$ ，会导致多重奖励吗？