机器算法验证 - 如何计算每个字符的位数（BPC）？ - 吾爱随笔录

如何计算每个字符的位数（BPC）？

机器算法验证可能性神经网络 lstm 循环神经网络

2022-03-21 10:03:40

在 Alex Graves 的一篇论文（以及其他几位作者）中，使用了术语每字符位数 (BPC)。我在这里引用的论文是“使用递归神经网络生成序列”（http://arxiv.org/abs/1308.0850）。

在论文中，他将 BPC 定义为 $-\log_2 P(x_{t+1} | y_t)$ ，在第 3.1 节中定义，结果如表 1 所示。

这是如何准确计算的，尤其是当涉及到循环神经网络时，例如 char-rnn？例如，给定循环神经网络的输入和预测，我如何计算每个字符的这些位数？

此外，这个问题解决了这个问题（https://stackoverflow.com/questions/17797922/how-to-calculate-bits-per-character-of-a-string-bpc）但没有一个答案解释它是如何计算的，特别是当涉及到 RNN 时。

1个回答

据我了解，BPC 只是平均交叉熵（与对数基数 2 一起使用）。

在 Alex Graves 的论文中，模型的目的是在给定过去字符的情况下近似下一个字符的概率分布。在每个时间步 $t$ ，我们称之为（近似）分布 $\hat{P}_t$ 然后让 $P_t$ 是真实的分布。这些离散的概率分布可以用一个大小为的向量来表示 $n$ ，其中 n 是字母表中可能的字符数。

因此 BPC 或平均交叉熵可以计算如下：

\begin{aligned} b p c (s t r i n g) = \frac{1}{T} \sum_{t = 1}^{T} H (P_{t}, {\hat{P}}_{t}) & = - \frac{1}{T} \sum_{t = 1}^{T} \sum_{c = 1}^{n} P_{t} (c) \log_{2} {\hat{P}}_{t} (c), \\ = - \frac{1}{T} \sum_{t = 1}^{T} \log_{2} {\hat{P}}_{t} (x_{t}) . \end{aligned}

$\begin{align} bpc(string) = \frac{1}{T}\sum_{t=1}^T H(P_t, \hat{P}_t) &= -\frac{1}{T}\sum_{t=1}^T \sum_{c=1}^n P_t(c) \log_2 \hat{P}_t(c), \\ & = -\frac{1}{T}\sum_{t=1}^T \log_2 \hat{P}_t(x_t). \end{align}$ 在哪里

T

$T$ 是输入字符串的长度。

第二行中的相等来自于真实分布 $P_t$ is zero everywhere except at the index corresponding to the true character $x_t$ in the input string at location $t$ .

Two things to note:

When you use an RNN, $\hat{P}_t$ can be obtained by applying a softmax to the RNN's output at time step $t$ (The number of output units in your RNN should be equal to $n$ - the number of characters in your alphabet).
In the equation above, the average cross-entropy is calculated over one input string of size T. In practice, you may have more than one string in your batch. Therefore, you should average over all of them (i.e. $bpc = mean_{strings} bpc(string)$ ).

其它你可能感兴趣的问题

上一篇如何计算对数损失的梯度和粗麻布？（问题基于来自 xgboost 的 github 存储库的 numpy 示例脚本）下一篇scikit-learn LogisticRegression 的损失函数