机器算法验证 - 将先验知识融入人工神经网络 - 吾爱随笔录

将先验知识融入人工神经网络

机器算法验证神经网络深度学习贝叶斯网络

2022-03-22 02:23:03

人工神经网络作为黑匣子而名声不佳。此外，在我们确实对特定监督学习问题的领域有一些先验知识的情况下，如何将其引入模型并不明显。

另一方面，贝叶斯模型和那些最先进的——贝叶斯网络——自然地解决了这个问题。但这些模型有其自身已知的局限性。

是否有可能从这两种模型中获得最佳效果。是否有任何理论或实际成功案例将这两种模型组合成某种混合体。
而且，一般而言，将先验知识整合到神经网络模型中的已知策略是什么（前馈或循环）

1个回答

实际上，有很多方法可以将先验知识整合到神经网络中。经常使用的最简单的先验知识类型是权重衰减。权重衰减假设权重来自均值为零和一些固定方差的正态分布。这种类型的先验作为额外项添加到损失函数中，具有以下形式：

L (w) = E (w) + λ \frac{1}{2} | | w | |_{2}^{2},

$\mathcal{L}(w) = E(w) + \lambda\frac{1}{2}||w||_2^2,$

在哪里 $E(w)$ 是数据项（例如 MSE 损失）和 $\lambda$ 控制两个词的相对重要性；它也与先验方差成正比。这对应于以下概率的负对数似然：

p (w | D) \propto p (D | w) p (w),

$p(w|\mathcal{D})\propto p(\mathcal D|w)p(w),$ 在哪里

p (w) = N (w | 0, λ^{- 1} I)

$p(w)=\mathcal N(w|0,\lambda^{-1}I)$ 和

- \log p (w) \propto - \log \exp (- \frac{λ}{2} | | w | |_{2}^{2}) = \frac{λ}{2} | | w | |_{2}^{2}

$-\log p(w)\propto -\log\,\exp(-\frac{\lambda}{2}||w||_2^2)=\frac{\lambda}{2}||w||_2^2$ . This is the same as the bayesian approach to modeling prior knowledge.

However, there are also other, less straight-forward methods to incorporate prior knowledge into neural networks. They are very important: prior knowledge is what really bridges the gap between huge neural networks and (relatively) small datasets. Some examples are:

Data augmentation: By training the network on data perturbed by various class-preserving transformations, you are incorporating your prior knowledge about the domain, namely the transformations that the network should be invariant to.

Network architecture: One of the most successful neural network techniques of the past decades are the convolutional networks. Their architecture sharing limited field-of-view kernels over spatial locations brilliantly exploits our knowledge about data in image space. This is also a form of prior knowledge incorporated into the model.

Regularization loss terms: Similar to weight decay, it is possible to construct other loss terms which penalize mappings contradicting our domain knowledge.

For an in-depth analysis/overview of these methods, I can point you to my article Regularization for Deep Learning: A Taxonomy. Also, I recommend looking into bayesian neural networks, meta-learning (finding meaningful prior information from other tasks in the same domain, see e.g. (Baxter, 2000)), possibly also one-shot learning (e.g. (Lake et al., 2015)).

其它你可能感兴趣的问题

上一篇为什么使用 DBSCAN 将我的大部分点归类为噪声？下一篇深度学习 - 词嵌入与词性