MNIST 的神经网络:非常低的准确度

数据挖掘 机器学习 神经网络 朱莉娅
2022-02-11 19:55:06

我正在通过实现神经网络来解决手写数字识别问题。但是网络的准确率非常低,训练数据集的准确率约为 11%。我不确定我的程序有什么问题。我尝试改变学习率和隐藏单元的数量,但没有运气。有人可以看看并帮助我解决我所缺少的吗?我在下面粘贴我的 Julia 代码:

# install
Pkg.add("MNIST");
using MNIST

# training data
X,y = traindata(); 
m = size(X, 2);
inputLayerSize = size(X,1); 
hiddenLayerSize = 300;
outputLayerSize = 10;

# representing each output as an array of size of the output layer
eyeY = eye(outputLayerSize);
intY = [convert(Int64,i)+1 for i in y];
Y = zeros(outputLayerSize, m);
for i = 1:m
    Y[:,i] = eyeY[:,intY[i],];
end

# weights with bias
Theta1 = randn(inputLayerSize+1, hiddenLayerSize); 
Theta2 = randn(hiddenLayerSize+1, outputLayerSize); 

function sigmoid(z)
    g = 1.0 ./ (1.0 + exp(-z));
    return g;
end

function sigmoidGradient(z)
  return sigmoid(z).*(1-sigmoid(z));
end

# learning rate
alpha = 0.01;
# number of iterations
epoch = 20;
# cost per epoch
J = zeros(epoch,1);
# backpropagation algorithm
for i = 1:epoch
    for j = 1:m # for each input
        # Feedforward
        # input layer
        # add one bias element
        x1 = [1, X[:,j]];

        # hidden layer
        z2 = Theta1'*x1;
        x2 = sigmoid(z2);
        # add one bias element
        x2 = [1, x2];

        # output layer
        z3 = Theta2'*x2;
        x3 = sigmoid(z3);

        # Backpropagation process
        # delta for output layer
        delta3 = x3 - Y[:,j];
        delta2 = (Theta2[2:end,:]*delta3).*sigmoidGradient(z2) ;

        # update weights
        Theta1 = Theta1 - alpha* x1*delta2';
        Theta2 = Theta2 - alpha* x2*delta3';
    end
end

function predict(Theta1, Theta2, X)
    m = size(X, 2); 
    p = zeros(m, 1);
    h1 = sigmoid(Theta1'*[ones(1,size(X,2)), X]);
    h2 = sigmoid(Theta2'*[ones(1,size(h1,2)), h1]);
    # 1 index is for 0, 2 for 1 ...so forth
    for i=1:m
        p[i,:] = indmax(h2[:,i])-1;
    end
    return p;
end

function accuracy(truth, prediction)
    m = length(truth);
    sum =0;
    for i=1:m
        if truth[i,:] == pred[i,:]
            sum = sum +1;
        end
    end
  return (sum/m)*100;
end

pred = predict(Theta1, Theta2, X);
println("train accuracy: ", accuracy(y, pred));
1个回答

你用的是什么损失函数?在我看来,您使用的是平方误差损失(对吗?)。这可能有效,但请考虑使用更适合分类问题的交叉熵损失。

此外,通过在最后一层使用逻辑函数作为激活函数,您可以将问题视为十个独立的二元分类问题。虽然这也可能有效,但由于问题是多类分类问题,您可能应该将最后一层的激活函数更改为 softmax。

我在您的代码中发现的一个错误是您没有考虑最后一层中非线性的导数。要解决此问题,您应该更改

delta3 = x3 - Y[:,j];

delta3 = (x3 - Y[:,j]) .* sigmoidGradient(z3);

类似于你计算的方式delta2