I have a pandas dataframe that looks like
Race_ID Date Student_ID Rank
1 1/1/2023 1 100-20
1 1/1/2023 2 40-10
2 11/9/2022 3 100-5
3 17/4/2022 5 100-0
3 17/4/2022 2 4-0
4 1/3/2022 1 10-5
where the data type of Rank is string. I want to extract the numbers in Rank and sum them, so the desire outcome looks like:
Race_ID Date Student_ID Rank New_column
1 1/1/2023 1 100-20 80
1 1/1/2023 2 40-10 30
2 11/9/2022 3 100-5 95
3 17/4/2022 5 100-0 100
3 17/4/2022 2 4-0 4
4 1/3/2022 1 10-5 5
>Solution :
One easy solution is to apply+pd.eval:
df['New_column'] = df['Rank'].apply(pd.eval)
Alternatively, extract both numbers and apply the subtraction:
df['New_column'] = (df['Rank'].str.extract(r'(\d+)-(\d+)')
.astype(int).pipe(lambda x: x[0]-x[1])
)
# or considering the sign and summing
df['New_column'] = (df['Rank'].str.extract(r'(\d+)(-\d+)')
.astype(int).sum(axis=1)
)
Output:
Race_ID Date Student_ID Rank New_column
0 1 1/1/2023 1 100-20 80
1 1 1/1/2023 2 40-10 30
2 2 11/9/2022 3 100-5 95
3 3 17/4/2022 5 100-0 100
4 3 17/4/2022 2 4-0 4
5 4 1/3/2022 1 10-5 5