How can I calculate mean by overlapping time intervals in one data frame with one datetime column in another data frame by grouping each name.
df1
df1 = (pd.DataFrame({'name': ['a', 'a', 'a', 'a', 'b'],
'time_start': ['2000-01-01 00:01:12',
'2000-01-01 00:02:12',
'2000-01-01 00:03:12',
'2000-01-01 00:04:12',
'2000-01-01 00:05:12'],
'time_end': ['2000-01-01 00:01:18',
'2000-01-01 00:02:22',
'2000-01-01 00:03:24',
'2000-01-01 00:04:40',
'2000-01-01 00:05:14']})
.assign(time_start = lambda x: pd.to_datetime(x['time_start']),
time_end = lambda x: pd.to_datetime(x['time_end'])))
df2
df2 = (pd.DataFrame({'name': ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'b'],
'time': ['2000-01-01 00:01:12',
'2000-01-01 00:01:13',
'2000-01-01 00:01:14',
'2000-01-01 00:02:13',
'2000-01-01 00:02:20',
'2000-01-01 00:03:12',
'2000-01-01 00:04:12',
'2000-01-01 00:04:30',
'2000-01-01 00:05:12'],
'values': [10,20, 30, 40,50,60,70,80, 90]})
.assign(time = lambda x: pd.to_datetime(x['time'])))
output should be
name time_start time_end values
0 a 2000-01-01 00:01:12 2000-01-01 00:01:18 20
1 a 2000-01-01 00:02:12 2000-01-01 00:02:22 45
2 a 2000-01-01 00:03:12 2000-01-01 00:03:24 60
3 a 2000-01-01 00:04:12 2000-01-01 00:04:40 75
4 b 2000-01-01 00:05:12 2000-01-01 00:05:14 90
>Solution :
You can join both DataFrames first and then test overlapping by Series.between with boolean indexing, last aggregate mean:
Solution working also for overlapping intervals.
df = df1.merge(df2, on='name')
df = (df[df['time'].between(df['time_start'], df['time_end'])]
.groupby(['name','time_start','time_end'], as_index=False)['values']
.mean())
print (df)
name time_start time_end values
0 a 2000-01-01 00:01:12 2000-01-01 00:01:18 20
1 a 2000-01-01 00:02:12 2000-01-01 00:02:22 45
2 a 2000-01-01 00:03:12 2000-01-01 00:03:24 60
3 a 2000-01-01 00:04:12 2000-01-01 00:04:40 75
4 b 2000-01-01 00:05:12 2000-01-01 00:05:14 90