Merging of multiples time series

Today I tried to merge multiples time series, corresponding to clinical recording (Heart rate, Arterial Pressure…), to make TSfresh analysis. Some of this have the same time step, and other have different one, like this :

df1 = df1 = pd.read_csv("PATH", delimiter='\t', header=None, index_col=0)
                     Values
Date                       
06/03/2021 17:22      30
06/03/2021 17:23      30
06/03/2021 17:24      30
06/03/2021 17:25      30
06/03/2021 17:26      30
...                   ..

df2 = pd.read_csv("PATH", delimiter='\t', header=None, index_col=0)
                     Values
Date                                            
06/03/2021 17:22      8,7                    
06/03/2021 17:23      8,6                    
06/03/2021 17:24      8,7                    
06/03/2021 17:25      8,6                    
06/03/2021 17:26      8,7                    
...                    ...

df3 = pd.read_csv("PATH", delimiter='\t', header=None, index_col=0)
                    Values
Date                      
06/03/2021 17:00      1
06/03/2021 18:00      1
06/03/2021 19:00      1
06/03/2021 20:00      3
06/03/2021 21:00      3

The code I made to merge it generate a new dataframe like this :

new = pd.concat([df1,df2,df3], ignore_index=True, axis=1)
                     Values1    Values2                Values3
Date                                                     
06/03/2021 17:22      30.0       8,7                     NaN
06/03/2021 17:23      30.0       8,6                     NaN
06/03/2021 17:24      30.0       8,7                     NaN
06/03/2021 17:25      30.0       8,6                     NaN
06/03/2021 17:26      30.0       8,7                     NaN
...
06/03/2021 18:00      32       9,5                       1

As we can see, only one variable of the df3 was insert in the new dataframe new, corresponding to the common time between the 3 dataframe, and the other rows was replace by NaN.
But I think it’s will be better to replace the NaN values by the value corresponding to the hour. For example : all the row corresponding to 06/03/2021 17:** will be equal to 1 for the values3, all the row corresponding to 06/03/2021 20:** will be 3…, like :

                     Values1    Values2                Values3
Date                                                     
06/03/2021 17:22      30.0       8,7                     1
06/03/2021 17:23      30.0       8,6                     1
06/03/2021 17:24      30.0       8,7                     1
06/03/2021 17:25      30.0       8,6                     1
06/03/2021 17:26      30.0       8,7                     1
...
06/03/2021 18:00      32       9,5                       1
06/03/2021 18:01      32       9,5                       1
06/03/2021 18:02      32       9,5                       1
...
06/03/2021 20:00      32       9,5                       3
06/03/2021 20:01      32       9,5                       3
06/03/2021 20:02      32       9,5                       3
...

Thank you everybody for your help !

>Solution :

After concat group the dataframe on datetime index (rounded to hour) then forward and backward fill the values

# convert to datetime index (if not already converted)
# new.index = pd.to_datetime(new.index)

new = new.groupby(new.index.floor('H'), group_keys=False).apply(lambda s: s.ffill().bfill())

Leave a Reply