Code used to reproduce the result below, the pandas version used is 1.4.2 in Python 3.9
df = pd.DataFrame(
{
'half': ['first half', 'first half', 'second half', 'second half'],
'vals': [10, 10, 5, 5]
}
)
first_half_s = df.loc[df['half'] == 'first half']['vals']
second_half_s = df.loc[df['half'] == 'second half']['vals']
diff_s = first_half_s - second_half_s
>>> diff_s = {Series: (4,)}(0, nan)(1, nan)(2, nan)(3, nan)
NB: I managed to do this by turning the series into np arrays and then doing element wise subtraction, but I am trying to figure out why this is happening in Pandas rather then finding a way to do the calculation.
Thank you!
>Solution :
There is different indices, so subtraction return NaNs.
Solution is convert second or first Series to numpy array:
first_half_s = df.loc[df['half'] == 'first half','vals']
second_half_s = df.loc[df['half'] == 'second half','vals']
print (first_half_s)
0 10
1 10
Name: vals, dtype: int64
print (second_half_s)
2 5
3 5
Name: vals, dtype: int64
diff_s = first_half_s - second_half_s.to_numpy()
print (diff_s)
0 5
1 5
Name: vals, dtype: int64
If set same default indices in both working solution like expecting:
#general solution
diff_s = first_half_s.reset_index(drop=True) - second_half_s.reset_index(drop=True)
#solution for this sameple data
#diff_s = first_half_s - second_half_s.reset_index(drop=True)
print (diff_s)
0 5
1 5
Name: vals, dtype: int64