将先验知识融入人工神经网络

机器算法验证 神经网络 深度学习 贝叶斯网络
2022-03-22 02:23:03

人工神经网络作为黑匣子而名声不佳。此外,在我们确实对特定监督学习问题的领域有一些先验知识的情况下,如何将其引入模型并不明显。

另一方面,贝叶斯模型和那些最先进的——贝叶斯网络——自然地解决了这个问题。但这些模型有其自身已知的局限性。

  • 是否有可能从这两种模型中获得最佳效果。是否有任何理论或实际成功案例将这两种模型组合成某种混合体。

  • 而且,一般而言,将先验知识整合到神经网络模型中的已知策略是什么(前馈或循环)

1个回答

实际上,有很多方法可以将先验知识整合到神经网络中。经常使用的最简单的先验知识类型是权重衰减权重衰减假设权重来自均值为零和一些固定方差的正态分布。这种类型的先验作为额外项添加到损失函数中,具有以下形式:

L(w)=E(w)+λ12||w||22,

在哪里E(w)是数据项(例如 MSE 损失)和λ控制两个词的相对重要性;它也与先验方差成正比。这对应于以下概率的负对数似然:

p(w|D)p(D|w)p(w),
在哪里p(w)=ñ(w|0,λ-1一世)logp(w)logexp(λ2||w||22)=λ2||w||22. This is the same as the bayesian approach to modeling prior knowledge.

However, there are also other, less straight-forward methods to incorporate prior knowledge into neural networks. They are very important: prior knowledge is what really bridges the gap between huge neural networks and (relatively) small datasets. Some examples are:

Data augmentation: By training the network on data perturbed by various class-preserving transformations, you are incorporating your prior knowledge about the domain, namely the transformations that the network should be invariant to.

Network architecture: One of the most successful neural network techniques of the past decades are the convolutional networks. Their architecture sharing limited field-of-view kernels over spatial locations brilliantly exploits our knowledge about data in image space. This is also a form of prior knowledge incorporated into the model.

Regularization loss terms: Similar to weight decay, it is possible to construct other loss terms which penalize mappings contradicting our domain knowledge.

For an in-depth analysis/overview of these methods, I can point you to my article Regularization for Deep Learning: A Taxonomy. Also, I recommend looking into bayesian neural networks, meta-learning (finding meaningful prior information from other tasks in the same domain, see e.g. (Baxter, 2000)), possibly also one-shot learning (e.g. (Lake et al., 2015)).