I have a column with the following type of values:
I have to define a function that convert column Diameter in float and manage that kind of exception. In particular:
- when is 42×54 then make the operation: sqrt(42^2+54^2)
- when is Steel then return a NaN value
>Solution :
You can use str.extract with a regex to get the numbers, then square them with pow, sum the columns, and get the square root with numpy.sqrt:
import numpy as np
df['Diameter2'] = np.sqrt(df['Diameter']
.str.extract('(\d+)(?:\s*x\s*(\d+))?')
.astype(float).pow(2)
.sum(axis=1, min_count=1)
)
output:
Diameter Diameter2
0 44 44.000000
1 42 x 54 68.410526
2 Steel NaN
(\d+) # capture a number
(?:\s*x\s*(\d+))? # capture a number (optionally) if preceded by x with optional spaces
