Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

creating and filling empty dates with zeroes

I have a dataframe df

df=pd.read_csv('https://raw.githubusercontent.com/amanaroratc/hello-world/master/x_restock.csv')
df

enter image description here

I want to fill the missing dates for each Product_ID with restocking_events=0. To start, I have created a date_range dataframe using dfdate=pd.DataFrame({'Date':pd.date_range(simple.Date.min(), simple.Date.max())}) where simple is some master dataframe and min and max dates are ‘2021-11-13’ and ‘2021-11-30’.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

enter image description here

>Solution :

Use:

#added parse_dates for datetimes
df=pd.read_csv('https://raw.githubusercontent.com/amanaroratc/hello-world/master/x_restock.csv', 
               parse_dates=['Date'])

First solution is for add complete range of datetimes from minimal and maximal datetimes in DataFrame.reindex by MultiIndex.from_product:

mux = pd.MultiIndex.from_product([df['Product_ID'].unique(),
                                  pd.date_range(df.Date.min(), df.Date.max())], 
                                 names=['Product_ID','Dates'])
                                  
df1 = df.set_index(['Product_ID','Date']).reindex(mux, fill_value=0).reset_index()
print (df1)
      Product_ID      Dates  restocking_events
0        1004746 2021-11-13                  0
1        1004746 2021-11-14                  0
2        1004746 2021-11-15                  0
3        1004746 2021-11-16                  1
4        1004746 2021-11-17                  0
         ...        ...                ...
3379      976460 2021-11-26                  1
3380      976460 2021-11-27                  0
3381      976460 2021-11-28                  0
3382      976460 2021-11-29                  0
3383      976460 2021-11-30                  0

[3384 rows x 3 columns]

Another idea with helper DataFrame:

from  itertools import product

dfdate=pd.DataFrame(product(df['Product_ID'].unique(), 
                            pd.date_range(df.Date.min(), df.Date.max())),
                    columns=['Product_ID','Date'])
print (dfdate)
      Product_ID       Date
0        1004746 2021-11-13
1        1004746 2021-11-14
2        1004746 2021-11-15
3        1004746 2021-11-16
4        1004746 2021-11-17
         ...        ...
3379      976460 2021-11-26
3380      976460 2021-11-27
3381      976460 2021-11-28
3382      976460 2021-11-29
3383      976460 2021-11-30

[3384 rows x 2 columns]
df = dfdate.merge(df, how='left').fillna({'restocking_events':0}, downcast='int')
print (df)
      Product_ID       Date  restocking_events
0        1004746 2021-11-13                  0
1        1004746 2021-11-14                  0
2        1004746 2021-11-15                  0
3        1004746 2021-11-16                  1
4        1004746 2021-11-17                  0
         ...        ...                ...
3379      976460 2021-11-26                  1
3380      976460 2021-11-27                  0
3381      976460 2021-11-28                  0
3382      976460 2021-11-29                  0
3383      976460 2021-11-30                  0

[3384 rows x 3 columns]

Or if need consecutive datetimes per groups use DataFrame.asfreq:

df2 = (df.set_index('Date')
         .groupby('Product_ID')['restocking_events']
         .apply(lambda x: x.asfreq('d', fill_value=0))
         .reset_index())
print (df2)
      Product_ID       Date  restocking_events
0         112714 2021-11-15                  1
1         112714 2021-11-16                  1
2         112714 2021-11-17                  0
3         112714 2021-11-18                  1
4         112714 2021-11-19                  0
         ...        ...                ...
2209     3630918 2021-11-25                  0
2210     3630918 2021-11-26                  0
2211     3630918 2021-11-27                  0
2212     3630918 2021-11-28                  0
2213     3630918 2021-11-29                  1

[2214 rows x 3 columns]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading