I am trying to convert a data type object to a float in Pandas, but I cannot fix this error. How can I solve this?
dataset: https://drive.google.com/file/d/1fWUG__B-11mV2td-eoqS7eF1eADnsdRs/view
Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
d:\Overdose AI\part_a.ipynb Cell 4 in 1
----> 1 df["quantity tons"] = df["quantity tons"].astype(float)
File c:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\generic.py:6245, in NDFrame.astype(self, dtype, copy, errors)
6238 results = [
6239 self.iloc[:, i].astype(dtype, copy=copy)
6240 for i in range(len(self.columns))
6241 ]
6243 else:
6244 # else, only a single dtype is given
-> 6245 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
6246 return self._constructor(new_data).__finalize__(self, method="astype")
6248 # GH 33113: handle empty frame or series
File c:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\internals\managers.py:446, in BaseBlockManager.astype(self, dtype, copy, errors)
445 def astype(self: T, dtype, copy: bool = False, errors: str = "raise") -> T:
--> 446 return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
File c:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\internals\managers.py:348, in BaseBlockManager.apply(self, f, align_keys, ignore_failures, **kwargs)
346 applied = b.apply(f, **kwargs)
347 else:
--> 348 applied = getattr(b, f)(**kwargs)
349 except (TypeError, NotImplementedError):
...
169 # Explicit copy, or required since NumPy can't view from / to object.
--> 170 return arr.astype(dtype, copy=True)
172 return arr.astype(dtype, copy=copy)
ValueError: could not convert string to float: 'e'
>Solution :
The problem is that there is a sting "e" in ‘quantity tons’ column (line 173088) in the file you provided.
To avoid this issue, I would suggest to check whether a column has any strings before changing its dtype. You can use the following code:
df[df['quantity tons'].apply(lambda x: isinstance(x, str))]
The output will show you only the rows where ‘quantity tons’ column contains strings.