Change the dataframe based on the date and the values between two window

July 21, 2023

I have a df which has 4 columns. I want to stick the rows of each id based on the date and the start_time and end_time.

I want to change the 15 interval between two times and change them to 1. And then stick them for each ids.
For example, for id 1, '2022-07-21 12:15:00 13:00:00' I want to create 1,1,1,0 for 12-13 and all of the other values should be zero.

I provide a df as follow and you can see that. For example for id=1 and 2. We dont have anything for 2022-07-20, so I put all
the values as zero. And same for the other ids and date. Here is the df:

df = pd.DataFrame()

df['id'] = [1, 1, 2, 2, 3]

df['date'] = ['2022-07-21','2022-07-22','2022-07-21','2022-07-22', '2022-07-20' ]

df['time_start'] = ['12:15:00','12:45:00','12:45:00','12:00:00','12:30:00' ]

df['end_start'] = ['13:00:00','13:00:00','13:00:00','13:00:00','13:00:00' ]
    

   id      date     time_start  end_start
    1   2022-07-21  12:15:00    13:00:00
    1   2022-07-22  12:45:00    13:00:00
    2   2022-07-21  12:45:00    13:00:00
    2   2022-07-22  12:00:00    13:00:00
    3   2022-07-20  12:30:00    13:00:00

and here is the df that I want:

df = pd.DataFrame()

df['id'] = [1, 2, 3]

df['d112:00:00'] = [ 0,0,0]
df['d112:15:00'] = [0,0,0]
df['d112:30:00'] = [0,0,1] 
df['d112:45:00'] = [0,0, 1]
df['d113:00:00'] = [0,0,0]
df['d212:00:00'] = [ 0,0,0]
df['d212:15:00'] = [1,0,0]
df['d212:30:00'] = [ 1,0,0]
df['d212:45:00'] = [ 1,1,0]
df['d213:00:00'] = [ 0,0,0]  
df['d312:00:00'] = [ 0,1,0]
df['d312:15:00'] = [ 0,1,0]
df['d312:30:00'] = [ 0,1,0]
df['d312:45:00'] = [ 1,1,0]
df['d313:00:00'] = [ 0,0,0]



id  d112:00:00  d112:15:00  d112:30:00  d112:45:00  d113:00:00  d212:00:00  d212:15:00  d212:30:00  d212:45:00  d213:00:00  d312:00:00  d312:15:00  d312:30:00  d312:45:00  d313:00:00
    1   0   0   0   0   0   0   1   1   1   0   0   0   0   1   0
    2   0   0   0   0   0   0   0   0   1   0   1   1   1   1   0
    3   0   0   1   1   0   0   0   0   0   0   0   0   0   0   0

Could you please help me with that? Thanks

>Solution :

You can rework the input, compute a crosstab, reindex with the desired range:

step = pd.Timedelta('15min')

start = pd.to_datetime(df['date']+' '+df['time_start'])
end   = pd.to_datetime(df['date']+' '+df['end_start'])
idx = df.index.repeat(end.sub(start).div(step))
s = start.loc[idx]
s += s.groupby(level=0).cumcount().mul(step)

cols = [d for date in sorted(df['date'].unique())
        for d in pd.date_range(date+' 12:00', date+' 13:00', freq=step)]

out = (pd.crosstab(df.loc[idx, 'id'].values, s)
         .reindex(columns=cols, fill_value=0)
         .rename_axis(columns=None, index='id').reset_index()
      )

Output:

   id  2022-07-20 12:00:00  2022-07-20 12:15:00  2022-07-20 12:30:00  2022-07-20 12:45:00  2022-07-20 13:00:00  2022-07-21 12:00:00  2022-07-21 12:15:00  2022-07-21 12:30:00  2022-07-21 12:45:00  2022-07-21 13:00:00  2022-07-22 12:00:00  2022-07-22 12:15:00  2022-07-22 12:30:00  2022-07-22 12:45:00  2022-07-22 13:00:00
0   1                    0                    0                    0                    0                    0                    0                    1                    1                    1                    0                    0                    0                    0                    1                    0
1   2                    0                    0                    0                    0                    0                    0                    0                    0                    1                    0                    1                    1                    1                    1                    0
2   3                    0                    0                    1                    1                    0                    0                    0                    0                    0                    0                    0                    0                    0                    0                    0