数据挖掘 - 了解 ANN 的输出以及如何改善验证损失 - 吾爱随笔录

如果这有点冗长，我深表歉意，但这是我发布的另一个用户的建议。

首先我要说的是，我对机器学习和深度学习的世界非常陌生。因此，我追求的最重要的事情是了解我在做什么。

我正在尝试建立一个用于二进制分类的 ANN。

我有一个 N x D 形式的二进制特征矩阵，其中 N 是样本数，D 是特征数。在我的数据集中，N 的最大值约为 200 万 - 但由于运行所需的时间（即使使用我的 GPU），我在 ~ 500k 上运行我的测试。如果我在测试后得到一个有希望的验证曲线，我将在完整的数据集上运行以进行验证。我的数据集中的 D 是 5。因此，我有一个 500000 x 5 形式的特征矩阵。下面的片段：

[[0 1 0 1 1]
 [0 1 1 0 1]
 [0 0 1 0 1]
 [1 1 0 1 1]
 [1 1 0 0 0]
 [0 1 1 0 1]
 [0 1 0 0 0]
 [1 1 0 1 1]
 [0 0 0 1 1]
 [1 0 0 0 1]]

我有一个二进制形式的目标矩阵，片段如下：

[1 0 1 1 0 0 1 0 1 1]

根据我的理解，对于二元分类，输入层应该与 D 相同，输出层应该有 1 个节点，并且应该有一个 sigmoid 激活函数。

因此，我采用了这种方法。现在，我也明白机器/深度学习是很多实验，所以我经历了许多不同的迭代，改变隐藏层的数量，以及改变每个隐藏层的节点数量——所有这些似乎都没有明显的好处. 我还玩过以下内容：学习率（对于 Adam 优化器）、训练与测试的比率（当前为 0.33）、随机状态变量（拆分数据集时 - 当前为 42）、批量大小（当前为 128）、时期（目前为 50）。所有这些对这些变量的实验仍然会导致很高的验证损失，我将在下面进一步展示。

现在，对于我的代码。下面是我将数据拆分为训练和测试的代码。

train_to_test_ratio = 0.33
random_state_ = 42

# split the data into train and test sets
# this lets us simulate how our model will perform in the future
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=train_to_test_ratio, random_state=random_state_)
N, D = x_train.shape

下面是我构建模型的代码。你可以看到我目前有 3 个隐藏层。我已经玩了很多 - 从 1 个隐藏层到 10 个，每层都有不同的节点。我发现最好的是下面的 3 个隐藏层。学习率目前0.001是创造最佳损失曲线的 - 任何更大的东西都太高了。

learning_rate = 0.001

# build the model
i = Input(shape=(look_back_period,))
x = Dense(8, activation="relu")(i)
x = Dense(16, activation="relu")(x)
x = Dense(32, activation="relu")(x)
#x = Dense(32, activation="relu")(x)
#x = Dense(64, activation="relu")(x)
#x = Dense(128, activation="relu")(x)
#x = Dense(64, activation="relu")(x)
#x = Dense(32, activation="relu")(x)
#x = Dense(16, activation="relu")(x)
#x = Dense(4, activation="relu")(x)
x = Dense(1, activation="sigmoid")(x)
model = Model(i, x)
model.compile(optimizer=Adam(lr=learning_rate), loss="binary_crossentropy", metrics=["accuracy"])

然后我训练模型：

# train the model
# trains on first half of dataset, and tests on second half
b_size = 128
iterations = 50
r = model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=iterations, batch_size=b_size)

这是验证曲线：

这是准确度曲线：

正如您所看到的，验证损失非常高，而且准确度不是太高，这对我来说是模型失败的。

这些输出是什么意思？在所有实验的意义上，我仍然无法以高精度降低验证损失。这对我来说是模型不正确。但正确的模型是什么？是否有任何建议可以更好地理解如何前进并建立更好的模型？