数据挖掘 - Fashion MNIST：有没有一种简单的方法可以只提取 1% 的数据来进行最小的网格搜索？ - 吾爱随笔录

我正在尝试在fashion-MNIST上实现几个模型。我已经根据tf.keras 教程导入了数据：

import tensorflow as tf
from tensorflow import keras
import sklearn
import numpy as np

f_mnist = keras.datasets.fashion_mnist

(train_images, train_labels), (test_images, test_labels) = f_mnist.load_data()
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']   
print(train_images)
print(train_labels)
>>(60000, 28, 28)
>>(60000,)

print(test_images)
print(test_labels)
>>(10000, 28, 28)
>>(10000,)
 
# Need to concatenate as GridsearchCV takes entire set in input
all_images = np.concatenate((train_images, test_images))
all_labels = np.concatenate((train_labels, test_labels))

print(all_images.shape)
print(all_labels.shape)
>>(70000, 28, 28)
>>(70000,)

这 10 个标签在训练和测试集中均等分布：

由于这只是为了练习，我想实现一个最小的网格搜索，但我不想使用整个 70 000 个样本集，我只想提取 1% 来对其进行网格搜索。

这样我就可以了解它是如何工作的，而无需花费太多时间在计算上。

但是，我看到的教程仅使用from skelearn.model_selection import GridSearchCV将整个集合作为输入的模块：

# Splitting the entire set into train and test
X_train, X_test, y_train, y_test = train_test_split(all_images,all_labels, 
test_size=0.3, random_state = 101)

parameters_grid={'C':[0.001, 0.01, 0.1, 1, 10], 'gamma': [1, 0.1, 
0.01, 0.001, 0.0001],
            'kernel': ['rbf']}
grid=GridSearchCV(SVC(),parameters_grid, refit = True, verbose = 3)
grid.fit( )

到目前为止，我能想到的唯一解决方法是只使用 test_images 集，因为它更小。但我想它仍然会运行一段时间，因为它包含 10 000 张图像......

我还考虑过更改函数以使用较小的部分进行训练，如下所示：

# Splitting the entire set into train and test
X_train, X_test, y_train, y_test  = train_test_split(test_images, test_labels, test_size=0.99, random_state = 101)

这样，我将仅使用仅包含 10 000 个样本的 test_images。我认为这将导致模型仅在 10 000 个模型中的 1% 上进行训练，其余模型将仅用于测试。

有没有更好的 python 方法来提取只有 1% 的all_imagesortest_images对应的all_labelsor test_labels？

显然，我将构建最终模型，提供所有 60 000 个训练样本，然后在 10 000 个样本上对其进行测试。

我用谷歌搜索并与同事交谈，但没有成功或答案。