机器算法验证 - 如何从神经网络获得实值连续输出？ - 吾爱随笔录

如何从神经网络获得实值连续输出？

机器算法验证回归神经网络

2022-02-12 13:40:38

在我迄今为止看到的大多数神经网络示例中，网络用于分类，节点使用 sigmoid 函数进行转换。但是，我想使用神经网络来输出一个连续的实数值（实际上输出通常在 -5 到 +5 的范围内）。

我的问题是：

1. Should I still scale the input features using feature scaling? What range?
2. What transformation function should I use in place of the sigmoid?

我希望最初实现描述这些层类型的 PyBrain 。

所以我在想我应该有 3 层开始（输入层、隐藏层和输出层），它们都是线性层？这是合理的方式吗？或者，我可以在 -5 到 5 的范围内“拉伸”sigmoid 函数吗？

2个回答

1 . 我还应该使用特征缩放来缩放输入特征吗？什么范围？

缩放不会使任何事情变得更糟。从 Sarle 的神经网络常见问题解答中阅读此答案：主题：我应该规范化/标准化/重新调整数据吗？.

2 . 我应该使用什么转换函数来代替 sigmoid？

您可以使用逻辑 sigmoid 或 tanh 作为激活函数。那没关系。您不必更改学习算法。您只需要将训练集的输出缩小到输出层激活函数的范围（ $[0,1]$ 或者 $[-1,1]$ ) 并且当你训练你的网络时，你必须将网络的输出扩展到 $[-5,5]$ . 你真的不需要改变任何东西。

Disclaimer: the approach presented is not feasible for continuous values, but I do believe bears some weight in decision making for the project Smarty77 brings up a good point about utilizing a rescaled sigmoid function. Inherently, the sigmoid function produces a probability, which describes a sampling success rate (ie 95 out of 100 photos with these features are successfully 'dog'). The final outcome described is a binary one, and the training, using 'binary cross-entropy' describes a process of separating diametrically opposed outcomes, which inherently discourages results in the middle-range. The continuum of the output is merely there for scaling based on number of samples (ie a result of 0.9761 means that 9761 out of 10000 samples displaying those or similar triats are 'dog'), but each result itself must still be considered to be binary and not arbitrarily granular. As such, it should not be mistaken for and applied as one would real numbers and may not be applicable here. Though I am not sure of the utilization of the network, I would normalize the output vector w.r.t. itself. This can be done with softmax. This will also require there to be 11 linear outputs (bins) from the network (one for each output -5 to +5), one for each class. It will provide an assurance value for any one 'bin' being the correct answer. This architecture would be trainable with one-hot encoding, with the 1 indicating the correct bin. The result is interpretable then in a manner of ways, like a greedy strategy or probabilistic sampling. However, to recast it into a continuous variable, the assuredness of each index can be used as a weight to place a marker on a number-line (similar to the behavior of the sigmoid unit), but this also highlights the primary issue: if the network is fairly certain the result is -2 or +3, but absolutely certain that it is not anything else, is +1 a viable result? Thank you for your consideration. Good luck on your project.

其它你可能感兴趣的问题

上一篇在线学习中的正则化和特征扩展？下一篇为什么它经常假设为高斯分布？