Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Fill missing dates in group and convert data to weekly

I have this data frame with lot of missing dates in between

df = pd.DataFrame({'date':['2021-12-1','2021-12-2','2021-12-21','2021-12-1','2021-12-7','2021-12-1','2021-12-5','2021-12-1','2021-12-5'],
                   'id1':['a1','a1','a1','a1','a1','a2','a2','a2','a2'],
                   'id2':['b1','b1','b1','b2','b2','b3','b3','b4','b4'],
                   'value1':[1,5,7,2,9,3,0,1,7],
                   'value2':[6,2,8,1,9,3,0,2,6]})

Which looks like this

         date id1 id2  value1  value2
0   2021-12-1  a1  b1       1       6
1   2021-12-2  a1  b1       5       2
2  2021-12-21  a1  b1       7       8
3   2021-12-1  a1  b2       2       1
4   2021-12-7  a1  b2       9       9
5   2021-12-1  a2  b3       3       3
6   2021-12-5  a2  b3       0       0
7   2021-12-1  a2  b4       1       2
8   2021-12-5  a2  b4       7       6

I want my output to look like this where the frequency is changed from daily to weekly and the week starts from Monday.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

         date id1 id2  value1  value2
0   2021-12-6  a1  b1       6       8
1  2021-12-13  a1  b1       0       0
2  2021-12-20  a1  b1       0       0
3  2021-12-27  a1  b1       7       8
4   2021-12-6  a1  b2       2       1
5  2021-12-13  a1  b2       9       9
6   2021-12-6  a2  b3       3       3
7   2021-12-6  a2  b4       8       8

Firstly I am filling missing dates with zero values and then in the second step converting daily data to weekly data using resample. Here I am using W-Mon which means I starting my week from Monday.

#Filling missing dates values with zero
df['date'] = pd.to_datetime(df['date'])
df = (df.set_index('date')
      .groupby(['id1','id2'])['value1','value2']
      .apply(lambda x: x.asfreq('d', fill_value=0))
      .reset_index()
      [['date','id1','id2','value1','value2']])
#convert to weekly data and set monday as starting day for each week
df = (df.groupby(['id1','id2'])
       .resample('W-Mon', label='right', closed = 'left', on='date')
       .agg({'value1':'sum',"value2":'sum'} )
       .reset_index())

I am not getting the expected output with the above code.

I am getting output like this

  id1 id2       date  value1  value2
0  a1  b1 2021-12-06       6       8
1  a1  b1 2021-12-13       7       8
2  a1  b2 2021-12-13      11      10
3  a2  b3 2021-12-13       3       3
4  a2  b3 2021-12-20       0       0
5  a2  b4 2021-12-20       1       2
6  a2  b4 2021-12-27       7       6

For groups, a2 and b4 is having dates 2021-12-27 even though in original data we don’t have any dates for that week.

>Solution :

The code works using the latest version of pandas.

Update your pandas version.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading