Suppose that I have a dataframe which can be created using code below
df = pd.DataFrame(data = {'date':['2021-01-01', '2021-01-02', '2021-01-05','2021-01-02', '2021-01-03', '2021-01-05'],
'product':['A', 'A', 'A', 'B', 'B', 'B'],
'price':[10, 20, 30, 40, 50, 60]
}
)
df['date'] = pd.to_datetime(df['date'])
I want to create an empty dataframe let’s say main_df which will contain all dates between df.date.min() and df.date.max() for each product and on days where values in nan I want to ffill and bfill for remaning. The resulting dataframe would be as below:
+------------+---------+-------+
| date | product | price |
+------------+---------+-------+
| 2021-01-01 | A | 10 |
| 2021-01-02 | A | 20 |
| 2021-01-03 | A | 20 |
| 2021-01-04 | A | 20 |
| 2021-01-05 | A | 30 |
| 2021-01-01 | B | 40 |
| 2021-01-02 | B | 40 |
| 2021-01-03 | B | 50 |
| 2021-01-04 | B | 50 |
| 2021-01-05 | B | 60 |
+------------+---------+-------+
>Solution :
First
make pivot table, upsampling by asfreq and fill null
df.pivot_table('price', 'date', 'product').asfreq('D').ffill().bfill()
output:
product A B
date
2021-01-01 10.0 40.0
2021-01-02 20.0 40.0
2021-01-03 20.0 50.0
2021-01-04 20.0 50.0
2021-01-05 30.0 60.0
Second
stack result and so on (include full code)
(df.pivot_table('price', 'date', 'product').asfreq('D').ffill().bfill()
.stack().reset_index().rename(columns={0:'price'})
.sort_values('product').reset_index(drop=True))
output:
date product price
0 2021-01-01 A 10.0
1 2021-01-02 A 20.0
2 2021-01-03 A 20.0
3 2021-01-04 A 20.0
4 2021-01-05 A 30.0
5 2021-01-01 B 40.0
6 2021-01-02 B 40.0
7 2021-01-03 B 50.0
8 2021-01-04 B 50.0
9 2021-01-05 B 60.0