Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Fill in missing dates for a pandas dataframe with multiple series

I have a dataframe that contains multiple time series, like so:

Date Item Category
2021-01-01 gouda cheese
2021-01-02 gouda cheese
2021-01-04 gouda cheese
2021-01-05 gouda cheese
2021-02-01 lettuce produce
2021-02-02 lettuce produce
2021-02-03 lettuce produce
2021-02-05 lettuce produce

I’d like to add rows for the missing dates (ex. 2021-01-03 for gouda, 2021-02-04 for lettuce). Note that these series do not necessarily start and end on the same date.

What is the best way to do this in pandas? I’d also like fill the new rows with the values in the "item" and "category" column for that series.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Group by Item and Category, then generate a time series from the min to the max date:

result = (
    df.groupby(["Item", "Category"])["Date"]
    .apply(lambda s: pd.date_range(s.min(), s.max()))
    .explode()
    .reset_index()
)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading