Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Converting string variable with double commas into float?

I have some strings in a column which originally uses commas as separators from thousands and from decimals and I need to convert this string into a float, how can I do it?

I firstly tried to replace all the commas for dots:

df['min'] = df['min'].str.replace(',', '.')

and tried to convert into float:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df['min']= df['min'].astype(float) 

but it returned me the following error:

ValueError                                Traceback (most recent call last)
<ipython-input-29-5716d326493c> in <module>
----> 1 df['min']= df['min'].astype(float)
      2 #df['mcom']= df['mcom'].astype(float)
      3 #df['max']= df['max'].astype(float)

~\anaconda3\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors)
   5544         else:
   5545             # else, only a single dtype is given
-> 5546             new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors,)
   5547             return self._constructor(new_data).__finalize__(self, method="astype")
   5548 

~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in astype(self, dtype, copy, errors)
    593         self, dtype, copy: bool = False, errors: str = "raise"
    594     ) -> "BlockManager":
--> 595         return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
    596 
    597     def convert(

~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in apply(self, f, align_keys, **kwargs)
    404                 applied = b.apply(f, **kwargs)
    405             else:
--> 406                 applied = getattr(b, f)(**kwargs)
    407             result_blocks = _extend_blocks(applied, result_blocks)
    408 

~\anaconda3\lib\site-packages\pandas\core\internals\blocks.py in astype(self, dtype, copy, errors)
    593             vals1d = values.ravel()
    594             try:
--> 595                 values = astype_nansafe(vals1d, dtype, copy=True)
    596             except (ValueError, TypeError):
    597                 # e.g. astype_nansafe can fail on object-dtype of strings

~\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py in astype_nansafe(arr, dtype, copy, skipna)
    993     if copy or is_object_dtype(arr) or is_object_dtype(dtype):
    994         # Explicit copy, or required since NumPy can't view from / to object.
--> 995         return arr.astype(dtype, copy=True)
    996 
    997     return arr.view(dtype)

ValueError: could not convert string to float: '1.199.75'

If it is possible, I would like to remove all dots and commas and then add the dots before the last two characters from the variables before converting into float.

Input:

df['min'].head()
9.50
10.00
3.45
1.095.50
13.25

Expected output:

9.50
10.00
3.45
1095.50
13.25

>Solution :

If you always have 2 decimal digits:

df['min'] = pd.to_numeric(df['min'].str.replace('.', '', regex=False)).div(100)

output (as new column min2 for clarity):

        min     min2
0      9.50     9.50
1     10.00    10.00
2      3.45     3.45
3  1.095.50  1095.50
4     13.25    13.25

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading