I want to split a column. If it has a letter (any letter) at the end, this will be the value for the second column. Otherwise, the second column should be null
import pandas as pd
data = pd.DataFrame({"data": ["0.00I", "0.01E", "99.99", "0.14F"]})
desired result:
a b
0 0.00 I
1 0.01 E
2 99.99 None
3 0.14 F
>Solution :
You can use str.extract with the (\d+(?:\.\d+)?)(\D)? regex:
out = data['data'].str.extract(r'(\d+(?:\.\d+)?)(\D)?').set_axis(['a', 'b'], axis=1)
Or, if you want to remove the original ‘data’ column while adding new columns in place:
data[['a', 'b']] = data.pop('data').str.extract('(\d+(?:\.\d+)?)(\D)?')
output:
a b
0 0.00 I
1 0.01 E
2 99.99 NaN
3 0.14 F
(\d+(?:\.\d+)?) # capture a number (with optional decimal)
(\D)? # optionally capture a non-digit