Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Attaching timeseries to a dataframe

I have a dataframe which looks like this:

"2023-09-07 13:22" type1 12.7
"2023-09-07 14:07" type2 101.1 

And separately a dataframe with reg spaced timeseries for each type:

                   type1     type2
2023-09-07 08:00       1         2
2023-09-07 08:15       3         4
2023-09-07 08:30       5         6
...
2023-09-07 13:15       7         8
2023-09-07 13:30       9        10      
2023-09-07 13:45      11        12
2023-09-07 14:00      13        14
2023-09-07 14:15      15        16
2023-09-07 14:30      17        18
...

I’d like to attach (as a row) to each line in the first dataframe 2 (or N) values from the second dataframe starting from the first timestamp after given.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

So in this case, the answer would be

"2023-09-07 13:22" type1 12.7    9 11
"2023-09-07 14:07" type2 101.1  16 18

I could loop over the rows in the first dataframe, and each time find a slice in the second dataframe, but that’s pretty slow. Was wondering if there’s a better solution. Seems like a pretty common task.

Thank you.

Code to generate input dataframes:

df1 = pd.DataFrame(columns = ["date", "type", "val"])
df1.loc[0] = [pd.to_datetime("2023-09-07 13:22:00"), "type1", 12.1]
df1.loc[1] = [pd.to_datetime("2023-09-07 14:07:00"), "type2", 101.1]
df1 = df1.set_index("date")
df2 = pd.DataFrame()
df2["date"] = pd.to_datetime(["2023-09-07 08:00", "2023-09-07 08:15","2023-09-07 08:30", "2023-09-07 13:15","2023-09-07 13:30", "2023-09-07 13:45","2023-09-07 14:00", "2023-09-07 14:15","2023-09-07 14:30"])
df2["type1"] = [1,3,5,7,9,11,13,15,17]
df2["type2"] = [2,4,6,8,10,12,14,16,18]

>Solution :

You can try pd.merge_asof + slicing as next step:

Input dataframes (sorted by index):

df1

                      type  value
time                             
2023-09-07 13:22:00  type1   12.7
2023-09-07 14:07:00  type2  101.1

df2

                     type1  type2
time                             
2023-09-07 08:00:00      1      2
2023-09-07 08:15:00      3      4
2023-09-07 08:30:00      5      6
2023-09-07 13:15:00      7      8
2023-09-07 13:30:00      9     10
2023-09-07 13:45:00     11     12
2023-09-07 14:00:00     13     14
2023-09-07 14:15:00     15     16
2023-09-07 14:30:00     17     18

N = 2

df2["time_tmp"] = df2.index
tmp = pd.merge_asof(df1, df2, left_index=True, right_index=True, direction="forward")
df1[list(range(N))] = tmp.apply(
    lambda x: df2.loc[x["time_tmp"] :, x["type"]][:N].values,
    axis=1,
    result_type="expand",
)
print(df1)

Prints:

                      type  value   0   1
time                                     
2023-09-07 13:22:00  type1   12.7   9  11
2023-09-07 14:07:00  type2  101.1  16  18

EDIT: "Safer" version, using np.pad (if there fewer values in row than N):

def fn(row):
    vals = df2.loc[row["time_tmp"] :, row["type"]][:N].values
    if len(vals) < N:
        vals = np.pad(
            vals, mode="constant", pad_width=(0, N - len(vals)), constant_values=-1
        )
    return vals


df2["time_tmp"] = df2.index
tmp = pd.merge_asof(df1, df2, left_index=True, right_index=True, direction="forward")
df1[list(range(N))] = tmp.apply(
    fn,
    axis=1,
    result_type="expand",
)
print(df1)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading