I have a dataframe which looks like this:
"2023-09-07 13:22" type1 12.7
"2023-09-07 14:07" type2 101.1
And separately a dataframe with reg spaced timeseries for each type:
type1 type2
2023-09-07 08:00 1 2
2023-09-07 08:15 3 4
2023-09-07 08:30 5 6
...
2023-09-07 13:15 7 8
2023-09-07 13:30 9 10
2023-09-07 13:45 11 12
2023-09-07 14:00 13 14
2023-09-07 14:15 15 16
2023-09-07 14:30 17 18
...
I’d like to attach (as a row) to each line in the first dataframe 2 (or N) values from the second dataframe starting from the first timestamp after given.
So in this case, the answer would be
"2023-09-07 13:22" type1 12.7 9 11
"2023-09-07 14:07" type2 101.1 16 18
I could loop over the rows in the first dataframe, and each time find a slice in the second dataframe, but that’s pretty slow. Was wondering if there’s a better solution. Seems like a pretty common task.
Thank you.
Code to generate input dataframes:
df1 = pd.DataFrame(columns = ["date", "type", "val"])
df1.loc[0] = [pd.to_datetime("2023-09-07 13:22:00"), "type1", 12.1]
df1.loc[1] = [pd.to_datetime("2023-09-07 14:07:00"), "type2", 101.1]
df1 = df1.set_index("date")
df2 = pd.DataFrame()
df2["date"] = pd.to_datetime(["2023-09-07 08:00", "2023-09-07 08:15","2023-09-07 08:30", "2023-09-07 13:15","2023-09-07 13:30", "2023-09-07 13:45","2023-09-07 14:00", "2023-09-07 14:15","2023-09-07 14:30"])
df2["type1"] = [1,3,5,7,9,11,13,15,17]
df2["type2"] = [2,4,6,8,10,12,14,16,18]
>Solution :
You can try pd.merge_asof + slicing as next step:
Input dataframes (sorted by index):
df1
type value
time
2023-09-07 13:22:00 type1 12.7
2023-09-07 14:07:00 type2 101.1
df2
type1 type2
time
2023-09-07 08:00:00 1 2
2023-09-07 08:15:00 3 4
2023-09-07 08:30:00 5 6
2023-09-07 13:15:00 7 8
2023-09-07 13:30:00 9 10
2023-09-07 13:45:00 11 12
2023-09-07 14:00:00 13 14
2023-09-07 14:15:00 15 16
2023-09-07 14:30:00 17 18
N = 2
df2["time_tmp"] = df2.index
tmp = pd.merge_asof(df1, df2, left_index=True, right_index=True, direction="forward")
df1[list(range(N))] = tmp.apply(
lambda x: df2.loc[x["time_tmp"] :, x["type"]][:N].values,
axis=1,
result_type="expand",
)
print(df1)
Prints:
type value 0 1
time
2023-09-07 13:22:00 type1 12.7 9 11
2023-09-07 14:07:00 type2 101.1 16 18
EDIT: "Safer" version, using np.pad (if there fewer values in row than N):
def fn(row):
vals = df2.loc[row["time_tmp"] :, row["type"]][:N].values
if len(vals) < N:
vals = np.pad(
vals, mode="constant", pad_width=(0, N - len(vals)), constant_values=-1
)
return vals
df2["time_tmp"] = df2.index
tmp = pd.merge_asof(df1, df2, left_index=True, right_index=True, direction="forward")
df1[list(range(N))] = tmp.apply(
fn,
axis=1,
result_type="expand",
)
print(df1)