运行模型。多次评估结果不同的精度和损失值 tensorflow 2

数据挖掘 张量流
2021-10-09 21:15:52

我已经训练了一个dataset = tf.data.Dataset.from_tensor_slices((data, label))用于创建数据集的 CNN 网络。训练进行得很顺利,但是在测试数据集上评估模型每次都会产生不同的值,而不会更改测试数据集或网络中的任何内容,而且我没有使用任何DropoutBatchnormalization

如有必要,我的代码:

 model = tf.keras.Sequential([
    Input((1,30,30)),
    Conv2D(filters = 8, kernel_size=(3,3), padding="same", activation="relu", name="c1", data_format="channels_first"),
    Conv2D(filters = 16, kernel_size=(3,3), padding="same", activation="relu", name="c2", data_format="channels_first"),
    MaxPool2D(pool_size=(2,2), strides=(1,1),padding="same", name="m1", data_format="channels_first"),

    Conv2D(filters = 16, kernel_size=(3,3), padding="same", activation="relu", name="c3", data_format="channels_first"),
    MaxPool2D(pool_size=(2,2), strides=(1,1),padding="same", name="m2",data_format="channels_first"),

    Flatten(),
    Dense(256, activation="relu", use_bias=True),
    Dense(5,  use_bias=True)])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=["accuracy"])
model.fit(train_data, verbose=1, validation_data=valid_data, epochs=20)


model.evaluate(test_data)

我是如何制作数据集的:

def split_dataset(dataset: tf.data.Dataset, validation_data_fraction: float):

    validation_data_percent = round(validation_data_fraction * 100)
    if not (0 <= validation_data_percent <= 100):
        raise ValueError("validation data fraction must be ∈ [0,1]")

    dataset = dataset.enumerate()
    train_dataset = dataset.filter(lambda f, data: f % 100 >= validation_data_percent)
    validation_dataset = dataset.filter(lambda f, data: f % 100 < validation_data_percent)

    # remove enumeration
    train_dataset = train_dataset.map(lambda f, data: data)
    validation_dataset = validation_dataset.map(lambda f, data: data)

    return train_dataset, validation_dataset

def load_data(path):
    data, label = data_prep(path)
    dataset = tf.data.Dataset.from_tensor_slices((data, label))
    dataset = dataset.shuffle(100000)
    train_dataset, rest = split_dataset(dataset, 0.3)
    test_dataset, valid_dataset = split_dataset(rest, 0.5)
    train_data = train_dataset.shuffle(1000).batch(10)
    valid_data = valid_dataset.batch(10)
    test_data = test_dataset.batch(10)
    return train_data, valid_data, test_data

例如运行model.evaluate(test_data)给出:

885/Unknown - 2s 2ms/step - loss: 0.1039 - accuracy: 0.9663
885/Unknown - 2s 2ms/step - loss: 0.0959 - accuracy: 0.9675
885/Unknown - 2s 2ms/step - loss: 0.0999 - accuracy: 0.9661
885/Unknown - 2s 2ms/step - loss: 0.0888 - accuracy: 0.9688
885/Unknown - 2s 2ms/step - loss: 0.0799 - accuracy: 0.9715
3个回答

问题在于您对整个数据集的第一次洗牌。你可以test_data在打电话之前检查你model.evaluate(test_data)的电话list(test_data.as_numpy_array())吗?我的假设是,每次调用它都会产生不同的结果。换句话说:您的模型很好,但您的数据集每次都不同,很可能是因为您使用了dataset.shuffle没有seed和没有停用reshuffle_each_iteration的 . 前者解释了运行之间的差异,而后者解释了运行之间的差异。

我的建议是这样的:

seed = 42

def load_data(path):
    data, label = data_prep(path)
    dataset = tf.data.Dataset.from_tensor_slices((data, label))
    # shuffle your dataset **once**, but reliably so that each run yields the same results
    dataset = dataset.shuffle(100000, seed=seed, reshuffle_each_iteration=False)
    train_dataset, rest = split_dataset(dataset, 0.3)
    test_dataset, valid_dataset = split_dataset(rest, 0.5)
    # (re)shuffle only the training set, but again, using a seed
    train_data = train_dataset.shuffle(1000, seed=seed).batch(10)
    valid_data = valid_dataset.batch(10)
    test_data = test_dataset.batch(10)
    return train_data, valid_data, test_data
```

我只是有同样的问题。运行 model.evaluate 应该给出相同的结果。这是真实的。但是打印出来的数字不是model.evaluate的结果!为了比较 model.evaluate 的结果,您必须比较返回的值。

在此处输入图像描述

如果您训练模型然后运行 model.evaluate N次(不要重新训练模型)您每次都应该得到相同的答案,前提是您的测试数据每次都相同。但是,如果您训练模型,然后运行评估并执行该组合N 由于网络的随机权重初始化,时间结果会有所不同。