数据挖掘 - 验证集上的随机准确度结果（二元分类） - 吾爱随笔录

我正在使用 TensorFlow 为 Python 中人类行为的分类问题配置 CNN 模型。

我的数据是代表人体关节的视频帧。每 3 个连续的列代表一个关节的 x,y,z 坐标。

我将 tf 中的占位符设置为形状 (None, l, j*3)，其中 l = 固定帧数， j = 关节数。现在，该模型在验证时给了我非常随机的结果（相当多的噪音）。

这是我用来计算损失和验证以及绘制结果的代码。

for e in range epochs:

# Loop over batches
        for x,y in random_batches(X_train, Y_train, batch_size):

            # Feed dictionary
            feed = {X : x, Y : y, keep_prob_ : 0.5, learning_rate_ : learning_rate}

            # Loss
            loss, _ , acc = sess.run([cost, optimizer, accuracy], feed_dict = feed)
            train_acc.append(acc)
            train_loss.append(loss)

            # Print acc & loss every 5 iter.s

            if (iteration % 5 == 0):
                print("Epoch: {}/{}".format(e, epochs), "Train acc: {:.6f}".format(acc))


            # Compute validation loss every 10 iter.s

            if (iteration%10 == 0):                
                val_acc_ = []
                val_loss_ = []

                for x_v, y_v in random_batches(X_valid, Y_valid, batch_size):
                    # Feed
                    feed = {X : x_v, Y : y_v, keep_prob_ : 1.0}  

                    # Loss
                    loss_v, acc_v = sess.run([cost, accuracy], feed_dict = feed)                    
                    val_acc_.append(acc_v)
                    val_loss_.append(loss_v)

                # Print
                print("Epoch: {}/{}".format(e, epochs), "Validation acc: {:.6f}".format(np.mean(val_acc_)))

                # Store
                validation_acc.append(np.mean(val_acc_))
                validation_loss.append(np.mean(val_loss_))


            # Iterate 
            iteration += 1


    ## Plotting:

t = np.arange(iteration-1)

plt.plot(t, np.array(train_loss), 'r-', t[t % 10 == 0], np.array(validation_loss), 'b*')

所以我想也许如果我告诉系统考虑每 3 个连续的列只代表一个关节，它会解决验证时的这种随机性。

不确定如何执行此操作，因为 tensorflow 接受数组。与数据帧不同，数组没有标题。

0  0.30467  0.45957 -0.95414  1.74687  1.42338 -0.03860   
1  0.27331  0.59293 -1.00874  1.74135  1.32004 -0.00701       
2  0.30348  0.88129 -1.05517  1.75090  1.65138 -0.03112

我希望系统理解的是前 3 列代表相同的关节，而第 2 3 列代表第 2 个关节。

我尝试将数据塑造为 (None, l, j, 3) 但这也给了我验证的随机结果。当我查找类似的作品时，他们正在提供诸如 (None, l, j*3) 之类的 tf 数据。

我怀疑是错误的另一个领域（虽然不太可能是错误的）是数据的洗牌。我有我的输入数据（X），并且我已经从文件名构建了标签（Y）。下面是 get batches 函数的一个片段（它进行改组）：

# 1- Shuffle (X, Y)
m= X.shape[0]

    permutation = list(np.random.permutation(m))
    shuffled_X = X[permutation,:,:]
    shuffled_Y = Y[permutation,:]


    # 2- Partition (shuffled_X, shuffled_Y). Minus the end case.

    num_batches = m // batch_size # number of mini batches of size batch_size in the partitionning        
    X, Y = shuffled_X[:num_batches*batch_size], shuffled_Y[:num_batches*batch_size]

    # Handling the end case (last mini-batch < mini_batch_size)
    if m % batch_size != 0:
        X = shuffled_X[num_batches * batch_size : m,:,:]
        Y = shuffled_Y[num_batches * batch_size : m,:]

    # Loop over batches and yield
    for b in range(0, len(X), batch_size):
        yield X[b:b+batch_size], Y[b:b+batch_size]

我是深度学习和张量流的初学者。在这里感谢任何建议或帮助。