Attaching timeseries to a dataframe

March 6, 2024

I have a dataframe which looks like this:

"2023-09-07 13:22" type1 12.7
"2023-09-07 14:07" type2 101.1

And separately a dataframe with reg spaced timeseries for each type:

                   type1     type2
2023-09-07 08:00       1         2
2023-09-07 08:15       3         4
2023-09-07 08:30       5         6
...
2023-09-07 13:15       7         8
2023-09-07 13:30       9        10      
2023-09-07 13:45      11        12
2023-09-07 14:00      13        14
2023-09-07 14:15      15        16
2023-09-07 14:30      17        18
...

I’d like to attach (as a row) to each line in the first dataframe 2 (or N) values from the second dataframe starting from the first timestamp after given.

So in this case, the answer would be

"2023-09-07 13:22" type1 12.7    9 11
"2023-09-07 14:07" type2 101.1  16 18

I could loop over the rows in the first dataframe, and each time find a slice in the second dataframe, but that’s pretty slow. Was wondering if there’s a better solution. Seems like a pretty common task.

Thank you.

Code to generate input dataframes:

df1 = pd.DataFrame(columns = ["date", "type", "val"])
df1.loc[0] = [pd.to_datetime("2023-09-07 13:22:00"), "type1", 12.1]
df1.loc[1] = [pd.to_datetime("2023-09-07 14:07:00"), "type2", 101.1]
df1 = df1.set_index("date")

df2 = pd.DataFrame()
df2["date"] = pd.to_datetime(["2023-09-07 08:00", "2023-09-07 08:15","2023-09-07 08:30", "2023-09-07 13:15","2023-09-07 13:30", "2023-09-07 13:45","2023-09-07 14:00", "2023-09-07 14:15","2023-09-07 14:30"])
df2["type1"] = [1,3,5,7,9,11,13,15,17]
df2["type2"] = [2,4,6,8,10,12,14,16,18]

>Solution :

You can try pd.merge_asof + slicing as next step:

Input dataframes (sorted by index):

df1

                      type  value
time                             
2023-09-07 13:22:00  type1   12.7
2023-09-07 14:07:00  type2  101.1

df2

                     type1  type2
time                             
2023-09-07 08:00:00      1      2
2023-09-07 08:15:00      3      4
2023-09-07 08:30:00      5      6
2023-09-07 13:15:00      7      8
2023-09-07 13:30:00      9     10
2023-09-07 13:45:00     11     12
2023-09-07 14:00:00     13     14
2023-09-07 14:15:00     15     16
2023-09-07 14:30:00     17     18

N = 2

df2["time_tmp"] = df2.index
tmp = pd.merge_asof(df1, df2, left_index=True, right_index=True, direction="forward")
df1[list(range(N))] = tmp.apply(
    lambda x: df2.loc[x["time_tmp"] :, x["type"]][:N].values,
    axis=1,
    result_type="expand",
)
print(df1)

Prints:

                      type  value   0   1
time                                     
2023-09-07 13:22:00  type1   12.7   9  11
2023-09-07 14:07:00  type2  101.1  16  18

EDIT: "Safer" version, using np.pad (if there fewer values in row than N):

def fn(row):
    vals = df2.loc[row["time_tmp"] :, row["type"]][:N].values
    if len(vals) < N:
        vals = np.pad(
            vals, mode="constant", pad_width=(0, N - len(vals)), constant_values=-1
        )
    return vals


df2["time_tmp"] = df2.index
tmp = pd.merge_asof(df1, df2, left_index=True, right_index=True, direction="forward")
df1[list(range(N))] = tmp.apply(
    fn,
    axis=1,
    result_type="expand",
)
print(df1)