I have a panda’s dataframe with strings, but I want floats. The units on the strings are different though. One column might be in mm with a few in cm. The next column might be g with a few in kg, and so on.
How can I convert each to floats in the same units? (i.e. if I’m changing everything to floats in mm, then "4.56 cm" should convert to 45.6, not 4.56).
>Solution :
Example
we need minimal and reproducible example to answer.
import pandas as pd
import numpy as np
data = {'length': ['4.56 cm', '100 mm', '3.5 cm', '10 mm'],
'weight': ['200 g', '0.5 kg', '300 g', '2 kg']}
df = pd.DataFrame(data)
df:
length weight
0 4.56 cm 200 g
1 100 mm 0.5 kg
2 3.5 cm 300 g
3 10 mm 2 kg
Code
I have prepared an answer with a example, but if your situation is different, please create and provide your own clear example.
s1 = df['length'].str.replace('[A-Za-z]', '', regex=True).astype('float')
df['length(mm)'] = np.where(df['length'].str.contains('cm'), s1 * 10, s1)
s2 = df['weight'].str.replace('[A-Za-z]', '', regex=True).astype('float')
df['weight(g)'] = np.where(df['weight'].str.contains('kg'), s2 * 1000, s2)
df:
length weight length(mm) weight(g)
0 4.56 cm 200 g 45.6 200.0
1 100 mm 0.5 kg 100.0 500.0
2 3.5 cm 300 g 35.0 300.0
3 10 mm 2 kg 10.0 2000.0