我的两个隐藏层的深度神经网络有什么问题?

数据挖掘 机器学习 神经网络 深度学习 张量流
2022-03-02 08:33:34
    batch_size = 128
size_1 = 1024
size_2 = 256
size_3 = 128
beta = 0.001

graph = tf.Graph()
with graph.as_default():

    tf_train_dataset = tf.placeholder(
        tf.float32,shape=(batch_size,image_size*image_size))
    tf_train_labels = tf.placeholder(
        tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)

    # Weights and Biases
    g_W1 = tf.Variable(
        tf.truncated_normal([image_size*image_size,size_1]))
    g_B1 = tf.Variable(
        tf.zeros([size_1]))

    g_W2 = tf.Variable(
        tf.truncated_normal([size_1,size_2]))
    g_B2 = tf.Variable(
        tf.zeros([size_2]))

    g_W3 = tf.Variable(
        tf.truncated_normal([size_2,num_labels]))
    g_B3 = tf.Variable(
        tf.zeros([num_labels]))

#     g_W4 = tf.Variable(
#         tf.truncated_normal([size_3,num_labels]))
#     g_B4 = tf.Variable(
#         tf.zeros([num_labels]))


    L1 = tf.nn.relu(
        tf.matmul(tf_train_dataset,g_W1) + g_B1)
    L2 = tf.nn.relu(
        tf.matmul(L1,g_W2) + g_B2)
#     L3 = tf.nn.relu(
#         tf.matmul(L2,g_W3) + g_B3)

    dr_prob = tf.placeholder("float")

    ##add dropout here
    #L1 = tf.nn.dropout(tf.nn.relu(
     #   tf.matmul(tf_train_dataset,g_W1) + g_B1), 1.0)
    #L2 = tf.nn.dropout(tf.nn.relu(
     #   tf.matmul(L1,g_W2) + g_B2), 1.0)
    #L3 = tf.nn.dropout(tf.nn.relu(
     #   tf.matmul(L2,g_W3) + g_B3), 1.0)


    logits = tf.matmul(L2, g_W3) + g_B3

    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits))+\
        beta*tf.nn.l2_loss(g_W1) +\
        beta*tf.nn.l2_loss(g_W2)+\
        beta*tf.nn.l2_loss(g_W3)
#         beta*tf.nn.l2_loss(g_W4)

        # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    L1_pred = tf.nn.relu(tf.matmul(tf_valid_dataset, g_W1) + g_B1)
    L2_pred = tf.nn.relu(tf.matmul(L1_pred, g_W2) + g_B2)
#     L3_pred = tf.nn.relu(tf.matmul(L2_pred, g_W3) + g_B3)
    valid_prediction = tf.nn.softmax(tf.matmul(L2_pred, g_W3) + g_B3)

    L1_test = tf.nn.relu(tf.matmul(tf_test_dataset, g_W1) + g_B1)
    L2_test = tf.nn.relu(tf.matmul(L1_test, g_W2) + g_B2)
#     L3_test = tf.nn.relu(tf.matmul(L2_test, g_W3) + g_B3)
    test_prediction = tf.nn.softmax(tf.matmul(L2_test, g_W3) + g_B3)

num_steps = 3001

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print("Initialized")
  for step in range(num_steps):
    # Pick an offset within the training data, which has been randomized.
    # Note: we could use better randomization across epochs.
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    # Generate a minibatch.
    batch_data = train_dataset[offset:(offset + batch_size), :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    # Prepare a dictionary telling the session where to feed the minibatch.
    # The key of the dictionary is the placeholder node of the graph to be fed,
    # and the value is the numpy array to feed to it.
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels, dr_prob : 0.5}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 500 == 0):
      print("Minibatch loss at step %d: %f" % (step, l))
      print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
      print("Validation accuracy: %.1f%%" % accuracy(
        valid_prediction.eval(), valid_labels))
  print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

现在是 2 天试图知道我的解决方案有什么问题,我希望有人能发现它,目的是训练两个隐藏 NN 的简单深度 NN,我检查了其他的解决方案,但我仍然不明白我的问题是什么代码(这是 Udacity 深度学习在线课程的第 4 题第 3 次作业)我得到以下输出..

Initialized
Minibatch loss at step 0: 3983.812256
Minibatch accuracy: 8.6%
Validation accuracy: 10.0%
Minibatch loss at step 500: nan
Minibatch accuracy: 9.4%
Validation accuracy: 10.0%
Minibatch loss at step 1000: nan
Minibatch accuracy: 8.6%
Validation accuracy: 10.0%
Minibatch loss at step 1500: nan
Minibatch accuracy: 11.7%
Validation accuracy: 10.0%
Minibatch loss at step 2000: nan
Minibatch accuracy: 6.2%
Validation accuracy: 10.0%
Minibatch loss at step 2500: nan
Minibatch accuracy: 10.2%
Validation accuracy: 10.0%
Minibatch loss at step 3000: nan
Minibatch accuracy: 7.8%
Validation accuracy: 10.0%
Test accuracy: 10.0%
2个回答

您没有在问题中说明您在调试时尝试了什么,但我会尝试回答。

简短回答:在我看来,您必须选择较低的学习率,因为您的损失在第一次迭代后呈爆炸式增长。

说明:您正在使用标准Stochastic Gradient Descent来执行优化。因此,它是一种非自适应学习率算法,这意味着如果后者选择不当,如果学习率过高,损失可能会爆炸。这就是为什么当我使用新的神经网络处理此类优化问题时,我喜欢做的是设置一个非常低的学习率以确保一开始就收敛。您也可以使用自适应优化器,例如同时具有 tensorflow 实现的人AdaGradAdam

我希望这能解决你的问题。

除了我标记为我的问题的答案(学习率)的回复之外,我还想添加以下需要更改的内容:

  1. 自从我将权重初始化为截断正常后的标准偏差,

  2. 使用截断我的 Relu 输出的函数(张量流中的 relu6)。