Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Fill missing dates in 2 level of groups in pandas

I want to fill in the missing dates only if a date is missing for the particular group.
Here country + county adds up to become a group
I have this data frame

df = pd.DataFrame({'date': ['2021-01-01','2021-01-02', '2021-01-01','2021-01-03','2021-01-01','2021-01-02', '2021-01-05','2021-01-07'],'country': ['a','a','a','a','b','b','b','b'], 'county': ['c','c','d','d','e','e','f','f'],  'sales': [1,2,1,45,2,341,14,25]})

Which looks like this

         date country county  sales
0  2021-01-01       a      c      1
1  2021-01-02       a      c      2
2  2021-01-01       a      d      1
3  2021-01-03       a      d     45
4  2021-01-01       b      e      2
5  2021-01-02       b      e    341
6  2021-01-05       b      f     14
7  2021-01-07       b      f     25

and I want this as output

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

         date country county  sales
0  2021-01-01       a      c      1
1  2021-01-02       a      c      2
2  2021-01-01       a      d      1
3  2021-01-02       a      d      0
4  2021-01-03       a      d      2
5  2021-01-01       b      e    341
6  2021-01-02       b      e     14
7  2021-01-05       b      f     25
8  2021-01-06       b      f     0
9  2021-01-07       b      f     64

In my example

  1. Country a and County d has one missing date 2021-01-02
  2. Country b and County f has one missing date 2021-01-06

so I have added the same dates and in place of sales added zero

I have gone through this Pandas filling missing dates and values within group but could not able to convert the same for my problem.

>Solution :

Use GroupBy.apply with lambd function with div.DataFrame.asfreq:

df['date'] = pd.to_datetime(df['date'])


df = (df.set_index('date')
        .groupby(['country','county'])['sales']
        .apply(lambda x: x.asfreq('d', fill_value=0))
        .reset_index()
        [['date','country','county','sales']])
print (df)
        date country county  sales
0 2021-01-01       a      c      1
1 2021-01-02       a      c      2
2 2021-01-01       a      d      1
3 2021-01-02       a      d      0
4 2021-01-03       a      d     45
5 2021-01-01       b      e      2
6 2021-01-02       b      e    341
7 2021-01-05       b      f     14
8 2021-01-06       b      f      0
9 2021-01-07       b      f     25
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading