Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

error: "ValueError: could not convert string to float: ''" in pandas

Hello everyone for school i need to make a machine learning project where i have to predict the rainfall

my excell looks like this:

excell

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

and this is my code:

import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics

df = pd.read_excel('Month.xlsx', usecols='W, AJ')
data = pd.read_excel('Month.xlsx')

df

x = df.iloc[:,0]
y = df.iloc[:,1]



model = LinearRegression()

model.fit(x, y)

y_pred = model.predict(x)

# Visualisatie van de regressielijn
plt.scatter(x, y,  color='gray')
plt.plot(x, y_pred, color='red', linewidth=2)
plt.xlabel('Luchtvochtigheid')
plt.ylabel('Regenval')
plt.show()

but i keep getting an error with "ValueError: could not convert string to float: ''. This error is given on the line where i try to model.fit(x,y)

I have tried to change the x and y values to a float with a new code like Xdata = float(x), but then i got a whole different error: cannot convert the series to <class 'float'>

>Solution :

This means that in your df you have some observations that are empty strings like this: '' that of course cannot be converted to a number.

In this case you have to decide what’s the best option between replace them with a number (like zero, the mean, the mean divided by the standard deviation etc) or just drop the rows with the empty string / None values.

Before implementing an ML algorithm you should always do some data preprocessing (that is also the difficult part in a data science project), that is totally missing in your code.

Also, you are not splitting X and y into train and test, hence you are using for predictions the same data you used for training, that is an error, since the model will be completely useless.

I suggest you to take a look at some documentation on how to implement a data science project, because your approach is lacking a lot of basic stuff.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading