Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Conditional merging with pandas

I have a pandas dataframe as below which details additional calls into an area:

CommsDate Area Day0 Incremental Day1 Incremental Day2 Incremental
01/01/24 Sales 43 36 29
01/01/24 Service 85 74 66
02/01/24 Sales 56 42 31
02/01/24 Service 73 62 49
03/01/24 Sales 48 32 24
03/01/24 Service 67 58 46

I am trying to calculate by date how many calls were received, so calls received for Sales on 1st Jan would be Day0_incremental for that date (43), 2nd Jan would be Day0 for 2nd Jan plus Day1 for 1st Jan (36+56) and 3rd Jan would be Day0 for 3rd Jan plus Day1 for 2nd Jan plus Day2 for 1st Jan (48+42+29), resulting in the below dataframe:

CallDate Sales Service
01/01/24 43 85
02/01/24 92 147
03/01/24 119 195
04/01/24 63 107
05/01/24 24 46

I have managed to create a shell of a dataframe for the second table with no values under the area columns, but at a loss as to the next steps:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df['CommsDate'] = pd.to_datetime(df['CommsDate'], format='%d/%m/%Y')
areaUnique = df['Area'].unique().tolist()
from datetime import timedelta
CallDate = pd.date_range(start=min(df['CommsDate']), end=max(df['CommsDate'])+timedelta(days=6), freq='D')

data = {area: [] for area in areaUnique}

dfNew = pd.DataFrame(data)

dfNew['CallDate'] = CallDate

dfNew = dfNew.melt(id_vars=['CallDate'], var_name='Area')

dfNew = dfNew.pivot(index='CallDate', columns='Area', values='value')

dfNew = dfNew.reset_index()

dfNew = dfNew[['CallDate'] + areaUnique]

I have started to write a for loop, but I’ve only got this far:

for i in range(1,len(areaUnique)+1):
    dfNew.columns(i) = 

>Solution :

You can pivot, shift and add:

df['CommsDate'] = pd.to_datetime(df['CommsDate'], dayfirst=True)
tmp = df.pivot(index='CommsDate', columns='Area')

out = (tmp['Day0 Incremental']
       .add(tmp['Day1 Incremental'].shift(freq='1d'), fill_value=0)
       .add(tmp['Day2 Incremental'].shift(freq='2d'), fill_value=0)
       .reset_index().rename_axis(columns=None)
      )

Or, programmatically with functools.reduce using the extracted number from the DayX … string:

from functools import reduce
import re

reg = re.compile(r'Day(\d+)')

df['CommsDate'] = pd.to_datetime(df['CommsDate'], dayfirst=True)
tmp = df.pivot(index='CommsDate', columns='Area')

out = reduce(lambda a,b: a.add(b, fill_value=0),
             (tmp[d].shift(freq=f'{reg.search(d).group(1)}d') for d in
              tmp.columns.get_level_values(0).unique())
            ).reset_index().rename_axis(columns=None)

Output:

   CommsDate  Sales  Service
0 2024-01-01   43.0     85.0
1 2024-01-02   92.0    147.0
2 2024-01-03  119.0    195.0
3 2024-01-04   63.0    107.0
4 2024-01-05   24.0     46.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading