人工神经网络作为黑匣子而名声不佳。此外,在我们确实对特定监督学习问题的领域有一些先验知识的情况下,如何将其引入模型并不明显。
另一方面,贝叶斯模型和那些最先进的——贝叶斯网络——自然地解决了这个问题。但这些模型有其自身已知的局限性。
是否有可能从这两种模型中获得最佳效果。是否有任何理论或实际成功案例将这两种模型组合成某种混合体。
而且,一般而言,将先验知识整合到神经网络模型中的已知策略是什么(前馈或循环)
人工神经网络作为黑匣子而名声不佳。此外,在我们确实对特定监督学习问题的领域有一些先验知识的情况下,如何将其引入模型并不明显。
另一方面,贝叶斯模型和那些最先进的——贝叶斯网络——自然地解决了这个问题。但这些模型有其自身已知的局限性。
是否有可能从这两种模型中获得最佳效果。是否有任何理论或实际成功案例将这两种模型组合成某种混合体。
而且,一般而言,将先验知识整合到神经网络模型中的已知策略是什么(前馈或循环)
实际上,有很多方法可以将先验知识整合到神经网络中。经常使用的最简单的先验知识类型是权重衰减。权重衰减假设权重来自均值为零和一些固定方差的正态分布。这种类型的先验作为额外项添加到损失函数中,具有以下形式:
在哪里是数据项(例如 MSE 损失)和控制两个词的相对重要性;它也与先验方差成正比。这对应于以下概率的负对数似然:
However, there are also other, less straight-forward methods to incorporate prior knowledge into neural networks. They are very important: prior knowledge is what really bridges the gap between huge neural networks and (relatively) small datasets. Some examples are:
Data augmentation: By training the network on data perturbed by various class-preserving transformations, you are incorporating your prior knowledge about the domain, namely the transformations that the network should be invariant to.
Network architecture: One of the most successful neural network techniques of the past decades are the convolutional networks. Their architecture sharing limited field-of-view kernels over spatial locations brilliantly exploits our knowledge about data in image space. This is also a form of prior knowledge incorporated into the model.
Regularization loss terms: Similar to weight decay, it is possible to construct other loss terms which penalize mappings contradicting our domain knowledge.
For an in-depth analysis/overview of these methods, I can point you to my article Regularization for Deep Learning: A Taxonomy. Also, I recommend looking into bayesian neural networks, meta-learning (finding meaningful prior information from other tasks in the same domain, see e.g. (Baxter, 2000)), possibly also one-shot learning (e.g. (Lake et al., 2015)).