Partitioning data into features/labels and train/test after reading from csv file

Question

I need to read data from a CSV file and then the first partition that data into features and labels and then into the training and testing set. However, there are several issues cropping up again and again. Below is the code I tried with error,

ValueError: could not convert string to float: 'mon' on line Y: train_y})

The code for Linear Regression:-

import pandas as pd from sklearn.model_selection import train_test_split import tensorflow as tf import numpy as np learning_rate = 0.01 training_epochs = 1000 display_step = 50 data = pd.read_csv('forestfires.csv') y = data.temp x = data.drop('temp', axis=1) train_x, test_x, train_y, test_y = train_test_split(x, y,test_size=0.2) n_samples = train_x.shape[0] n_features = train_x.shape[1] X = tf.placeholder('float', [None, n_features]) Y = tf.placeholder('float', [None, 1]) # Model weights. W = tf.Variable(np.random.randn(n_features, 1), dtype='float32') b = tf.Variable(np.random.randn(1), dtype='float32') # Construct linear model. prediction = tf.matmul(X, W) + b loss = tf.reduce_sum(tf.pow(prediction - Y, 2))/(2 * n_samples) optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss) # Start training. with tf.Session() as sess: sess.run(tf.global_variables_initializer()) for epoch in range(training_epochs): for (x, y) in zip(train_x, train_y): sess.run(optimizer, feed_dict={X: train_x, Y: train_y}) # Display logs per epoch step. if (epoch + 1) % display_step == 0: c = sess.run(loss, feed_dict={X: train_x, Y: train_y}) print ('Epoch:', '%04d' % (epoch+1), 'cost=','{:.9f}'.format(c), \ 'W=', sess.run(W), 'b=', sess.run(b)) print ('Training Done!') training_cost = sess.run(loss, feed_dict={X: train_x, Y: train_y}) print ('Training cost=', training_cost, 'W=', sess.run(W), 'b=', sess.run(b), '\n') # Graphic display. plt.plot(train_x, train_y, 'ro', label='Original data') plt.plot(train_x, sess.run(W) * train_x + sess.run(b), label='Fitted line') plt.legend() plt.show()

Could anyone help me with reading data properly in a rather general way? Snapshot of the data:-

$\begingroup$Add a snapshot of the data!$\endgroup$
– Aditya
CommentedAug 18, 2018 at 9:51 — Aditya, CommentedAug 18, 2018 at 9:51

Green Falcon · Accepted Answer · 2018-08-18 09:21:10Z

0

I don't know exactly how your data is but y = data.temp may be a Series containing the string values which should be cast to float values. Try to change it to the following alternative.

y = data.temp.astype(float)

answered Aug 18, 2018 at 9:21

Green Falcon

14.3k10 gold badges59 silver badges98 bronze badges

1
$\begingroup$Or maybe they are cats which need to be transformed..$\endgroup$
– Aditya
CommentedAug 18, 2018 at 9:51
2
$\begingroup$Where did you see cats?!$\endgroup$
– Green Falcon
CommentedAug 18, 2018 at 10:10
1
$\begingroup$Cats -> Categories/Strings maybe.. Sorry for the shorthand..$\endgroup$
– Aditya
CommentedAug 18, 2018 at 17:21

Add a comment |

shepan6 · Accepted Answer · 2020-06-20 09:08:38Z

So, the question is to understand this ValueError that you are getting.

This error I believe is referring to your month column, which I presume you are using a feature for this network. If so, as this is a categorical variable, you will need to change this into a one-hot encoding representation (https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/), because the model cannot interpret the string, hence the ValueError.

Stack Exchange Network

Partitioning data into features/labels and train/test after reading from csv file

2 Answers 2

Hot Network Questions

Partitioning data into features/labels and train/test after reading from csv file

2 Answers 2

Related

Hot Network Questions