I have this input data with the month interval defined by values in 2 columns as such:
Customer ID Order quantity Order valid from Order valid to
Customer 1 20 Aug-21 Dec-21
Customer 2 50 Oct-21 Jan-22
I would like to create a new dataframe where each of the month in the interval is a new row by itself while retaining data on other columns. The output should look like this:
Customer Month Order quantity
Customer 1 Aug-21 20
Customer 1 Sep-21 20
Customer 1 Oct-21 20
Customer 1 Nov-21 20
Customer 1 Dec-21 20
Customer 2 Oct-21 50
Customer 2 Nov-21 50
Customer 2 Dec-21 50
Customer 2 Jan-22 50
Super grateful for any help!
>Solution :
df = df.assign(Month=df.apply(lambda x: pd.date_range(x['Order valid from'], x['Order valid to'], freq='M'), axis=1)).explode('Month').drop(['Order valid from', 'Order valid to'], axis=1).reset_index(drop=True)
Output:
>>> df
Customer ID Order quantity Month
0 Customer 1 20 2021-08-31
1 Customer 1 20 2021-09-30
2 Customer 1 20 2021-10-31
3 Customer 1 20 2021-11-30
4 Customer 2 50 2021-10-31
5 Customer 2 50 2021-11-30
6 Customer 2 50 2021-12-31