数据挖掘 - 如何在keras的DQN中实现裁剪奖励 - 吾爱随笔录

如何在keras的DQN中实现裁剪奖励

数据挖掘深度学习张量流训练 dqn kerasrl

2021-10-02 16:50:15

如何在 keras 中实现 DQN 中的奖励裁剪？尤其是如何实现裁剪奖励？

这个伪代码是否正确：

if reward<-threshold reward=-1
elseif reward>threshold reward=1
elseif -threshold<reward<threshold reward=reward/threshold

如果奖励总是积极的，我们如何改变削减奖励？

1个回答

既然你正在使用keras-rl，你可以使用它的类Processor。只需编写一个新处理器并分配给您的代理即可。新处理器将类似于：

class MyProcessor(Processor):
    def process_reward(self, reward):
        """Processes the reward as obtained from the environment for use in an agent and
        returns it.

        # Arguments
            reward (float): A reward as obtained by the environment

        # Returns
            Reward obtained by the environment processed
        """
        # Change min and max according to your needs. I supposed that your threshold was 1.
        min = -1
        max = 1
        return float(np.clip(reward, min, max))

请注意，没有真正需要将奖励转换为浮点数，并且在某些代理实现中它也可能失败（但如果这样做，代理实现是错误的）。我这样做是因为在 Gym 中，奖励被定义为浮动。如前所述，如果它是一个 int、一个 numpy.float64 或其他但很容易转换为浮点数的东西，通常不会出错。

其它你可能感兴趣的问题