从 csv 文件读取后将数据划分为特征/标签和训练/测试

数据挖掘 数据挖掘 熊猫 线性回归
2022-03-06 02:09:13

我需要从 CSV 文件中读取数据,然后首先将该数据划分为特征和标签,然后再划分为训练和测试集。但是,有几个问题一次又一次地出现。下面是我尝试错误的代码,

ValueError: could not convert string to float: 'mon' 
on line 
Y: train_y})

线性回归的代码:-

import pandas as pd
from sklearn.model_selection import train_test_split
import tensorflow as tf
import numpy as np

learning_rate = 0.01
training_epochs = 1000
display_step = 50

data = pd.read_csv('forestfires.csv')
y = data.temp
x = data.drop('temp', axis=1)

train_x, test_x, train_y, test_y = train_test_split(x, y,test_size=0.2)
n_samples = train_x.shape[0]
n_features = train_x.shape[1]

X = tf.placeholder('float', [None, n_features])
Y = tf.placeholder('float', [None, 1])

# Model weights.
W = tf.Variable(np.random.randn(n_features, 1), dtype='float32')
b = tf.Variable(np.random.randn(1), dtype='float32')

# Construct linear model.
prediction = tf.matmul(X, W) + b
loss = tf.reduce_sum(tf.pow(prediction - Y, 2))/(2 * n_samples)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

# Start training.
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for epoch in range(training_epochs):
        for (x, y) in zip(train_x, train_y):
            sess.run(optimizer, feed_dict={X: train_x,
                                           Y: train_y})
            # Display logs per epoch step.
            if (epoch + 1) % display_step == 0:
                c = sess.run(loss, feed_dict={X: train_x,
                                              Y: train_y})
                print ('Epoch:', '%04d' % (epoch+1), 'cost=','{:.9f}'.format(c), \
                       'W=', sess.run(W), 'b=', sess.run(b))
    print ('Training Done!')
    training_cost = sess.run(loss, feed_dict={X: train_x,
                                              Y: train_y})
    print ('Training cost=', training_cost, 'W=', sess.run(W), 'b=', sess.run(b), '\n')
    # Graphic display.
    plt.plot(train_x, train_y, 'ro', label='Original data')
    plt.plot(train_x, sess.run(W) * train_x + sess.run(b), label='Fitted line')
    plt.legend()
    plt.show()

任何人都可以帮助我以相当一般的方式正确读取数据吗?数据快照:-

在此处输入图像描述

2个回答

我不确切知道您的数据是怎样的,但y = data.temp可能是一个包含应该转换为浮点值的字符串值的系列。尝试将其更改为以下替代方案。

y = data.temp.astype(float)

所以,问题是要理解你得到的这个 ValueError 。

我相信这个错误是指您的month专栏,我认为您正在使用该网络的功能。如果是这样,因为这是一个分类变量,您需要将其更改为一次性编码表示(https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/),因为模型无法解释字符串,因此出现 ValueError。