Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas: Difference between two series with different length (unaligned index)

Consider two following series:

sri = inp.groupby(inp.index.date)['value'].count()

2009-01-12    7
2009-01-14    3

and

sro = out.groupby(out.index.date)['value'].count()

2009-01-03      1
2009-01-09     14
2009-01-10     61
2009-01-11     93
2009-01-12    106
2009-01-13    123
2009-01-14    130

When we subtract one from another, sro-sri, we have:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

2009-01-03      NaN
2009-01-09      NaN
2009-01-10      NaN
2009-01-11      NaN
2009-01-12     99.0
2009-01-13      NaN
2009-01-14    127.0

However my desired output is:

2009-01-03      1.0
2009-01-04      0.0
2009-01-05      0.0
2009-01-06      0.0
2009-01-07      0.0
2009-01-08      0.0
2009-01-09     14.0
2009-01-10     61.0
2009-01-11     93.0
2009-01-12     99.0
2009-01-13    123.0
2009-01-14    127.0

and we can use the following workaround to generate the same result:

start_date = '2009-01-03'
end_date = '2009-01-15'
df = pd.DataFrame(
    index=pd.date_range(pd.to_datetime(start_date), pd.to_datetime(end_date) - timedelta(days=1), freq='d').date)
df = df.merge(sro.to_frame(), how='outer', left_index=True, right_index=True) \
    .merge(sri.to_frame(), how='outer', left_index=True, right_index=True).fillna(0)
print(df['value_x'] - df['value_y'])

Is there a much more compact solution to generate the same output?

>Solution :

A simple approach for a simple subtraction is to use sub with fillna=0:

sro.sub(sri, fill_value=0).convert_dtypes()

output:

2009-01-03      1
2009-01-09     14
2009-01-10     61
2009-01-11     93
2009-01-12     99
2009-01-13    123
2009-01-14    127

To add the missing indexes:

idx = sro.index.union(sri.index)
(sro.sub(sri, fill_value=0)
    .reindex(pd.date_range(idx.min(), idx.max()).astype(str), fill_value=0)
    .convert_dtypes()
)

output:

2009-01-03      1
2009-01-04      0
2009-01-05      0
2009-01-06      0
2009-01-07      0
2009-01-08      0
2009-01-09     14
2009-01-10     61
2009-01-11     93
2009-01-12     99
2009-01-13    123
2009-01-14    127

used input:

sri = pd.Series({'2009-01-12': 7, '2009-01-14': 3})
sro = pd.Series({'2009-01-03': 1, '2009-01-09': 14, '2009-01-10': 61, '2009-01-11': 93, '2009-01-12': 106, '2009-01-13': 123, '2009-01-14': 130})
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading