I have the following dataframe:
df = pd.DataFrame({'A': ['2.5cm','2.5cm','2.56”','1.38”','2.2”','0.8 in','$18.00','4','2"']})
which looks like:
A
2.5cm
2.5cm
2.56”
1.38”
2.2”
0.8 in
$18.00
4
2"
I want to remove all characters except for the decimal points.
The output should be:
A
2.5
2.5
2.56
1.38
2.2
0.8
18.00
4
2
Here is what I’ve tried:
df['A'] = df.A.str.replace(r"[a-zA-Z]", '')
df['A'] = df.A.str.replace('\W', '')
but this is stripping out everything including the decimal point.
Any suggestions would be greatly appreciated.
Thank you in advance
>Solution :
You can use str.extract to extract only the floating points:
df['A'] = df['A'].astype(str).str.extract(r'(\d+.\d+|\d)').astype('float')
Output:
A
0 2.50
1 2.50
2 2.56
3 1.38
4 2.20
5 0.80
6 18.00
7 4.00
8 2.00