Home Why does subtracting one part of a DataFrame column from another part of the same column return an NaN series of the length of the original column

Questions

Why does subtracting one part of a DataFrame column from another part of the same column return an NaN series of the length of the original column

byMR

April 18, 2023

Code used to reproduce the result below, the pandas version used is 1.4.2 in Python 3.9

df = pd.DataFrame(
        {
            'half': ['first half', 'first half', 'second half', 'second half'],
            'vals': [10, 10, 5, 5]
        }
    )
    first_half_s = df.loc[df['half'] == 'first half']['vals']
    second_half_s = df.loc[df['half'] == 'second half']['vals']

    diff_s = first_half_s - second_half_s

>>> diff_s = {Series: (4,)}(0, nan)(1, nan)(2, nan)(3, nan)

NB: I managed to do this by turning the series into np arrays and then doing element wise subtraction, but I am trying to figure out why this is happening in Pandas rather then finding a way to do the calculation.

Thank you!

>Solution :

There is different indices, so subtraction return NaNs.

Solution is convert second or first Series to numpy array:

first_half_s = df.loc[df['half'] == 'first half','vals']
second_half_s = df.loc[df['half'] == 'second half','vals']


print (first_half_s)
0    10
1    10
Name: vals, dtype: int64

print (second_half_s)
2    5
3    5
Name: vals, dtype: int64

diff_s = first_half_s - second_half_s.to_numpy()
print (diff_s)
0    5
1    5
Name: vals, dtype: int64

If set same default indices in both working solution like expecting:

#general solution
diff_s = first_half_s.reset_index(drop=True) - second_half_s.reset_index(drop=True)

#solution for this sameple data
#diff_s = first_half_s - second_half_s.reset_index(drop=True)

print (diff_s)
0    5
1    5
Name: vals, dtype: int64