数据挖掘 - 视觉词袋 - 吾爱随笔录

我正在尝试做的事情：

我正在尝试使用本地和全局特征对一些图像进行分类。

到目前为止我做了什么：

我已经为每个图像提取了筛选描述符，并将其用作 k-means 的输入，以从每个图像的所有特征中创建我的词汇表。从这里开始，我通过将图像筛选特征传递给 k-means 中的预测方法，为每个图像创建一个直方图，为我提供聚类的标签。从这里我通过计算每个 bin 的标签数量来创建直方图。现在我有一个 nxm 矩阵，其中 n 是图像的数量，m 是每个图像的聚类（特征/单词）的数量。

我会将这个矩阵提供给分类器以获取我的图像分类。

简而言之，步骤：

提取筛选特征描述符，为每个图像提供一个 nx128 矩阵
将所有特征描述符堆叠成一个大列表
将所有这些特征拟合到 kmeans 算法设置 k=100
对于每个图像，使用其筛选特征来使用相同的训练 kmeans 模型预测集群的标签
使用 k 作为 bin 数从集群创建直方图，模型中每个标签的 bin 加 1。（如果一张图片有 10 个来自 sift 的特征，它将给我们 10 个标签，这 10 个标签将在 k 的范围内，因此对于每个标签，我们将其添加到直方图的相应 bin 中）。
我们现在有一个 nxk 矩阵，其中 n 是图像的数量，k 是集群的数量。
我们现在将直方图输入分类器，并要求它对测试数据进行预测。

问题：

我是否正确地执行了视觉词袋？

这是我的代码：

def extract_features(df):
    IF = imageFeatures()
    global_features = []
    sift_features = []
    labels = []
    for i, (index, sample) in enumerate(df.iterrows()):
        image = cv2.imread(sample["location"])
        image = cv2.resize(image, shape)
        hist = IF.fd_histogram(image)
        haralick = IF.fd_haralick(image)
        hu = IF.fd_hu_moments(image)
        lbp = IF.LocalBinaryPatterns(image, 24, 8)
        kp, des = IF.SIFT(image)
        if len(kp) == 0:
            #print (i)
            #print (index)
            #print (sample)
            #return 0
            des = np.zeros(128)
        sift_features.append(des)
        global_feature = np.hstack([hist, haralick, hu, lbp])
        global_features.append(global_feature)
        labels.append(sample["class_id"])
    scaler = MinMaxScaler(feature_range=(0, 1))
    rescaled = scaler.fit_transform(global_features)
    return sift_features, rescaled, labels

def BOVW(feature_descriptors, n_clusters = 100):
    print("Bag of visual words with {} clusters".format(n_clusters))
    #take all features and put it into a giant list
    combined_features = np.vstack(np.array(feature_descriptors))
    #train kmeans on giant list
    print("Starting K-means training")
    kmeans = MiniBatchKMeans(n_clusters=n_clusters, random_state=0).fit(combined_features)
    print("Finished K-means training, moving on to prediction")
    bovw_vector = np.zeros([len(feature_descriptors), n_clusters])#number of images x number of clusters. initiate matrix of histograms
    for index, features in enumerate(feature_descriptors):#sift descriptors in each image
        try:
            for i in kmeans.predict(features):#get label for each centroid
                bovw_vector[index, i] += 1#create individual histogram vector
        except:
            pass
    return bovw_vector#this should be our histogram

if __name__ == '__main__':
    n_clusters = 100
    #set model
    model = GaussianNB()
    image_list = pd.read_csv("image_list.csv")
    image_list_subset = image_list.groupby('class_id').head(80)#image_list.loc[(image_list["class_id"] == 0) | (image_list["class_id"] == 19)]
    shape = (330,230)
    train, test = train_test_split(image_list_subset, test_size=0.1, random_state=42)

    train_sift_features, train_global_features, y_train = extract_features(train)
    train_histogram = BOVW(train_sift_features, n_clusters)
    import matplotlib.pyplot as plt
    plt.plot(train_histogram[100], 'o')
    plt.ylabel('frequency');
    plt.xlabel('features');

    test_sift_features, test_global_features, y_test = extract_features(test)
    test_histogram = BOVW(test_sift_features, n_clusters)

    '''Naive Bays'''
    y_hat = model.fit(train_histogram, y_train).predict(test_histogram)
    print("Number of correctly labeled points out of a total {} points : {}. An accuracy of {}"
          .format(len(y_hat), sum(np.equal(y_hat,np.array(y_test))), 
                  sum(np.equal(y_hat,np.array(y_test)))/len(y_hat)))