数据挖掘 - 神经网络实现故障排除 - 吾爱随笔录

我一直在学习 Standford/Coursera 机器学习课程；并且进展顺利。我真的对理解这个主题比从课程中获得分数更感兴趣，因此我试图用我更流利的编程语言编写所有代码（我可以很容易地深入研究的根源）。

我最好的学习方式是解决问题，所以我实现了一个神经网络，但它不起作用。无论测试示例如何，我似乎都获得了每个类的相同概率（例如，0.45 类，1 类 0.55，与输入值无关）。奇怪的是，如果我删除所有隐藏层，情况并非如此。

这是我所做的简要介绍；

Set all Theta's (weights) to a small random number

for each training example

set activation 0 on layer 0 as 1 (bias)
set layer 1 activations = inputs

forward propagate;
 Z(j+1) = Theta(j) x activation(j)  [matrix operations]
 activation(j+1) = Sigmoid function (Z(j+1)) [element wise sigmoid]

Set Hx = final layer activations

Set bias of each layer (activation 0,0) = 1

[back propagate]
calculate delta;
delta(last layer) = activation(last layer) - Y  [Y is the expected answer from training set]

delta(j) = transpose(Theta(j)) x delta(j+1) .* (activation(j) .*(Ones - activation(j))

[where ones is a matrix of 1's in every cell; and .* is the element wise multiplication]
[Don't calculate delta(0) since there ins't one for input layer]


DeltaCap(j) = DeltaCap(j) + delta(j+1) x transpose(activation(j))

Next [End for]

Calculate D;

D(j) = 1/#Training * DeltaCap(j) (for j = 0)

D(j) = 1/#Training * DeltaCap(j) + Lambda/#Training * Theta(j) (for j = 0)


[calculate cost function]

J(theta) = -1/#training * Y*Log(Hx) + (1-Y)*log(1-Hx) + lambda/ (2 * #training) * theta^2

Recalculate Theta

Theta = Theta - alpha * D

这可能不是一件好事。如果有人可以告诉我我的代码中是否存在任何重大缺陷，那将是非常棒的，否则我可能会出错/如何调试类似的东西的一些一般想法也会很棒。

编辑：

这是网络的快速图像（包括输入和响应的测试用例）（这是在 100 万次梯度下降迭代之后）；

我使用的数据集是两个考试成绩作为 x，进入大学的成功/失败作为 y。显然，两个 0 的测试分数意味着无法进入大学，但是网络表明 56% 的机会以 0 作为输入获得它。

编辑#2；

我已经运行了具有以下结果的梯度检查算法；

数值计算：-0.0074962585205895493 传播值：0.62021047431540277

数值计算：0.0032635827218463476 传播值：-0.39564819922432665

等等。显然这里有问题；我会努力解决的。