我正在尝试使用 Scikit learn 对 MLPRegressor 应用自动微调。阅读了一圈后,我决定使用 GridSearchCV 来选择最合适的超参数。在此之前,我应用了 MinMaxScaler 预处理。数据集是 105 个整数的列表(香槟月销量)。
问题是由于某种原因 GridSearchCV 没有运行(我认为至少是正确的)。当我打印模型使用的参数时,会出现一些超出 param_list 中定义的范围的值。
此外,我知道数据集对于 MLP 来说太小了,这个想法是现在对模型进行编程,然后在更大的数据集中使用它。虽然,最终的数据集不是很大,所以我会非常感谢听到任何想法来提高小数据集中模型的准确性!
谢谢!
代码:
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import TimeSeriesSplit
from sklearn.model_selection import GridSearchCV
from matplotlib import pyplot
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error
import pandas as pd
dataset = pd.read_csv('champagne.csv', header=None)
scaler = MinMaxScaler()
scaled_dataset = scaler.fit_transform(dataset)
mlpr = MLPRegressor(max_iter=7000)
param_list = {"hidden_layer_sizes": [1,50], "activation": ["identity", "logistic", "tanh", "relu"], "solver": ["lbfgs", "sgd", "adam"], "alpha": [0.00005,0.0005]}
gridCV = GridSearchCV(estimator=mlpr, param_grid=param_list)
splits = TimeSeriesSplit(n_splits=3)
pyplot.figure(1)
index = 1
for train_index, test_index in splits.split(scaled_dataset):
training_set = scaled_dataset[train_index]
testing_set = scaled_dataset[test_index]
train_index_array = train_index.reshape(-1,1)
test_index_array = test_index.reshape(-1,1)
gridCV.fit(train_index_array, training_set)
predicted = gridCV.predict(test_index_array)
parameters = mlpr.get_params()
test_mse = mean_squared_error(testing_set, predicted)
pyplot.subplot(310 + index)
pyplot.plot(predicted)
pyplot.plot([None for i in training_set] + [x for x in testing_set])
index += 1
train_index.flatten()
test_index.flatten()