张量流中的猫狗分类器,基本问题!

数据挖掘 机器学习 分类 张量流 图像分类
2022-03-05 21:27:54

我正在尝试为一组包含猫和狗的图像构建图像分类器。我对创建神经网络模型的黑暗艺术非常陌生。过去我曾成功使用 Keras 构建模型,并且正在尝试使用 tensorflow 构建一些更高级的模型。

输入数据是 128x128 灰度图像,标签为两个类(cat/dog = [1,0],[0,1])的 one-hot 编码我检查并重新检查了我的训练数据,它是平衡的等等

但是,我在数据样本(大约 1000 张图像)上运行了短迭代(少于 3000 次),因为我没有 GPU,或者在对完整集进行训练时没有足够的内存来进行多任务处理。我知道这不足以完全训练模型,但我希望在进行一晚的训练之前至少看到准确性或损失的一些变化。

要看的东西:损失函数、准确度指标、网络结构、训练操作、训练例程

核心问题:网络未训练,损失收敛到特定值并保持不变,精度在迭代中保持不变

损失收敛到相同的值并且不会改变。最终的损失值如下所示:(对于 10 的批量大小(此处 = len(loss),以及两个类)

loss = [[0.31326172 0.6931472 ], [0.31326172 0.6931472 ], [1.3132617 0.6931472 ], [0.31326172 0.6931472 ], [0.31326172 0.6931472 ], [0.31326172 0.6931472 ], [1.3132617 0.6931472 ], [0.31326172 0.6931472 ], [1.3132617 0.6931472 ], [0.31326172 0.6931472 ]]

一些典型的结果,超过 100 和 1000 次迭代: 迭代精度 1

迭代精度 2

大约 30 次迭代的损失如下所示: 在此处输入图像描述

我试过了:

  • 不同的学习率(0.1 到 0.001)

  • 不同的损失函数(sigmoid 交叉熵、softmax 交叉熵)

  • 不同批次大小 (1,10,32)

我觉得我在这里犯了一些基本错误,有人能看出什么问题吗?

该模型:

# Helper functions to setup convolution layers
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)
def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)
def conv2d(input_to_layer, filter_params):
    return tf.nn.conv2d(input_to_layer, filter_params, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
def convpool(layer_in, WEIGHT):
    conv = conv2d(layer_in, weight_variable(WEIGHT))
    conv_bias = conv + bias_variable([WEIGHT[-1]])
    conv_bias_relu = tf.nn.relu(conv_bias)
    layer_out = max_pool_2x2(conv_bias_relu)
    return layer_out

x = tf.placeholder(dtype = tf.float32, shape = [None, 128, 128], name = "Xplaceholder")
y = tf.cast(tf.placeholder(dtype = tf.int32, shape = [None,2], name = "yplaceholder"),tf.float32)

image = tf.reshape(x, [-1,128,128,1])  

# LAYERS
layer1_out = convpool(image,[3,3,1,64]) #OUT: -1x64x64x64
layer2_out = convpool(layer1_out, [3,3,64,128]) #OUT: -1x32x32x128
layer3_out = tf.contrib.layers.fully_connected(layer2_out, 512, tf.nn.relu) #OUT: -1x512
layer4_out = tf.nn.dropout(layer3_out, keep_prob=0.99)
layer5_out = tf.reshape(layer4_out, [10, 32*32*512]) #BATCH SIZE HERE IS 10
layer6_out = tf.contrib.layers.fully_connected(layer5_out, 2, tf.nn.softmax) # Logit

loss =  tf.nn.sigmoid_cross_entropy_with_logits(logits=layer6_out, labels=y)
train_op = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss)
correct_pred = tf.one_hot(tf.argmax(layer6_out, 1), 2, axis = 1)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

训练常规:

def batch_trainer(training_set, training_labels, eval_set, eval_labels, iterations = 2500, batch_size = 125):
    sess.run(tf.global_variables_initializer())
    print("")
    print("Training:")
    train_accuracy_logs = []
    eval_accuracy_logs = []

    epochs_completed = 0
    index_in_epoch = 0
    num_examples = len(training_labels)
    assert batch_size <= num_examples # BATCH SIZE MUST BE LESS SIZE OF TRAINING SET!

    def next_batch(training_set, training_labels, epochs_completed, index_in_epoch, batch_size):
        start = index_in_epoch
        index_in_epoch += batch_size

        if index_in_epoch > num_examples:
            epochs_completed += 1
            # re shuffles
            perm = np.random.shuffle(np.arange(num_examples))
            training_set = training_set[perm]
            training_labels = training_labels[perm]
            # start next epoch
            start = 0
            index_in_epoch = batch_size
            assert batch_size <= num_examples
        end = index_in_epoch

        return training_set[start:end], training_labels[start:end]

    for i in range(iterations):
        batch_xs, batch_ys = next_batch(training_set, training_labels, epochs_completed, index_in_epoch, batch_size)
        _, accuracy_val = sess.run([train_op, accuracy], feed_dict={x: batch_xs, y: batch_ys})
        train_accuracy_logs.append(accuracy_val)

        if i%batch_size == 0:
            eval_accuracy_logs.append(test_model(eval_set, eval_labels))
            average_acc = (max(train_accuracy_logs)+max(eval_accuracy_logs)+min(train_accuracy_logs)+min(eval_accuracy_logs))/4
            print("    " + str(i).zfill(4)+": Accuracy: {:.1f}%".format(accuracy_val*100)) 

            plt.rcParams['figure.figsize'] = (4.0, 1.0)
            ax = plt.axes()
            plt.plot(train_accuracy_logs, c = 'r', linewidth = 2)
            plt.plot(np.arange(0,i+1,batch_size),eval_accuracy_logs, c = 'black', linewidth = 2)
            plt.grid()
            plt.rc('font',size=20)
            plt.text(1.3, average_acc, "{:.2f}%".format(accuracy_val*100), verticalalignment='center', transform = ax.transAxes)
            plt.rc('font',size=2)
            plt.axis('off')
            plt.show()

        if i%10 == 0:
            print("    "+str(i).zfill(5))

    plt.rcParams['figure.figsize'] = (10.0, 5.0)
    plt.rc('font',size=10)
    plt.title("Accuacy over iterations")
    plt.plot(train_accuracy_logs, c = 'r', label = 'training')
    plt.plot(np.arange(0,iterations,batch_size), eval_accuracy_logs, c = 'black', label = 'testings')
    plt.legend()
    plt.show()
    print("    completed {} iterations, during which {} epochs passed".format(iterations, epochs_completed))

    return train_accuracy_logs, eval_accuracy_logs
1个回答

您可以尝试迁移学习技术来轻松地从图像中获取特征。或者,您可以使用我在猫狗分类中遇到相同问题时使用的解决方案,创建一个双分支的功能模型,每个模型包含 4 个 conv2d 层,每两个 conv2d 之间具有 maxpooling2d((2,2))。

然后连接和展平,最后使用一个密集层,然后输出。我在 10 个 epoch 内获得了 0.9+ 的验证和训练准确度。

下面是我的模型架构结构,希望你能设计出来。