数据集:https ://www.kaggle.com/bryanpark/sudoku
我想为这个数据集创建一个神经网络。
特征:
X: [ '004300209005009001070060043006002087190007400050083000600000105003508690042910300'
'040100050107003960520008000000000017000906800803050620090060543600080700250097100']
输出:
Y: [ '864371259325849761971265843436192587198657432257483916689734125713528694542916378'
'346179258187523964529648371965832417472916835813754629798261543631485792254397186']
零代表一个空白框,数据是一个 9x9 的扁平网格。
我尝试使用以下代码,但我发现数据需要大量预处理。
def preprocess():
data = pd.read_csv('Sudoku/sudoku.csv')
print('Data: ', data.head())
x = data[data.columns[0]].values
y = data[data.columns[1]].values
x = x.reshape(-1, 1)
y = y.reshape(-1, 1)
print('\nx: ', x[0])
print('\ny: ', y[0])
return x, y
x, y = preprocess()
train_x, test_x, train_y, test_y = train_test_split(x, y, test_size=0.2)
scl = StandardScaler()
train_x = scl.fit_transform(train_x)
train_y = scl.fit_transform(train_y)
test_x = scl.fit_transform(test_x)
test_y = scl.fit_transform(test_y)
model = Sequential()
model.add(Dense(100, input_dim=1, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(1, activation='linear'))
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(train_x, train_y, epochs=3, validation_split=0.1, verbose=2)
我想知道如何处理这些数据,当然还有我的神经网络的结构。