I have a df which has 4 columns. I want to stick the rows of each id based on the date and the start_time and end_time.
I want to change the 15 interval between two times and change them to 1. And then stick them for each ids.
For example, for id 1, '2022-07-21 12:15:00 13:00:00' I want to create 1,1,1,0 for 12-13 and all of the other values should be zero.
I provide a df as follow and you can see that. For example for id=1 and 2. We dont have anything for 2022-07-20, so I put all
the values as zero. And same for the other ids and date. Here is the df:
df = pd.DataFrame()
df['id'] = [1, 1, 2, 2, 3]
df['date'] = ['2022-07-21','2022-07-22','2022-07-21','2022-07-22', '2022-07-20' ]
df['time_start'] = ['12:15:00','12:45:00','12:45:00','12:00:00','12:30:00' ]
df['end_start'] = ['13:00:00','13:00:00','13:00:00','13:00:00','13:00:00' ]
id date time_start end_start
1 2022-07-21 12:15:00 13:00:00
1 2022-07-22 12:45:00 13:00:00
2 2022-07-21 12:45:00 13:00:00
2 2022-07-22 12:00:00 13:00:00
3 2022-07-20 12:30:00 13:00:00
and here is the df that I want:
df = pd.DataFrame()
df['id'] = [1, 2, 3]
df['d112:00:00'] = [ 0,0,0]
df['d112:15:00'] = [0,0,0]
df['d112:30:00'] = [0,0,1]
df['d112:45:00'] = [0,0, 1]
df['d113:00:00'] = [0,0,0]
df['d212:00:00'] = [ 0,0,0]
df['d212:15:00'] = [1,0,0]
df['d212:30:00'] = [ 1,0,0]
df['d212:45:00'] = [ 1,1,0]
df['d213:00:00'] = [ 0,0,0]
df['d312:00:00'] = [ 0,1,0]
df['d312:15:00'] = [ 0,1,0]
df['d312:30:00'] = [ 0,1,0]
df['d312:45:00'] = [ 1,1,0]
df['d313:00:00'] = [ 0,0,0]
id d112:00:00 d112:15:00 d112:30:00 d112:45:00 d113:00:00 d212:00:00 d212:15:00 d212:30:00 d212:45:00 d213:00:00 d312:00:00 d312:15:00 d312:30:00 d312:45:00 d313:00:00
1 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0
2 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0
3 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0
Could you please help me with that? Thanks
>Solution :
You can rework the input, compute a crosstab, reindex with the desired range:
step = pd.Timedelta('15min')
start = pd.to_datetime(df['date']+' '+df['time_start'])
end = pd.to_datetime(df['date']+' '+df['end_start'])
idx = df.index.repeat(end.sub(start).div(step))
s = start.loc[idx]
s += s.groupby(level=0).cumcount().mul(step)
cols = [d for date in sorted(df['date'].unique())
for d in pd.date_range(date+' 12:00', date+' 13:00', freq=step)]
out = (pd.crosstab(df.loc[idx, 'id'].values, s)
.reindex(columns=cols, fill_value=0)
.rename_axis(columns=None, index='id').reset_index()
)
Output:
id 2022-07-20 12:00:00 2022-07-20 12:15:00 2022-07-20 12:30:00 2022-07-20 12:45:00 2022-07-20 13:00:00 2022-07-21 12:00:00 2022-07-21 12:15:00 2022-07-21 12:30:00 2022-07-21 12:45:00 2022-07-21 13:00:00 2022-07-22 12:00:00 2022-07-22 12:15:00 2022-07-22 12:30:00 2022-07-22 12:45:00 2022-07-22 13:00:00
0 1 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0
1 2 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0
2 3 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0