据我了解,BPC 只是平均交叉熵(与对数基数 2 一起使用)。
在 Alex Graves 的论文中,模型的目的是在给定过去字符的情况下近似下一个字符的概率分布。在每个时间步吨,我们称之为(近似)分布磷^吨然后让磷吨是真实的分布。这些离散的概率分布可以用一个大小为的向量来表示n,其中 n 是字母表中可能的字符数。
因此 BPC 或平均交叉熵可以计算如下:
b p c ( s t r i n g) =1吨∑t = 1吨H(磷吨,磷^吨)= -1吨∑t = 1吨∑c = 1n磷吨(三)日志2磷^吨( c ) ,= -1吨∑t = 1吨日志2磷^吨(X吨) .
在哪里吨是输入字符串的长度。
第二行中的相等来自于真实分布Pt is zero everywhere except at the index corresponding to the true character xt in the input string at location t.
Two things to note:
- When you use an RNN, P^t can be obtained by applying a softmax to the RNN's output at time step t (The number of output units in your RNN should be equal to n - the number of characters in your alphabet).
- In the equation above, the average cross-entropy is calculated over one input string of size T. In practice, you may have more than one string in your batch. Therefore, you should average over all of them (i.e. bpc=meanstringsbpc(string)).