I have 2 data frames one with frequent entries. I would like to concat them and fill NaN in less frequent last entry, but if the last entry was NaN, I would like to fill with NaN
Example:
df = pd.DataFrame(data=[4.5, 4.6, 5.7, 5.7, 6.7, 4, 9.0],
index=list(map(pd.to_datetime, ['00:00', '00:30', '01:00', '01:30', '02:00', '02:30', '03:00'])),
columns=['frequent data'])
df2 = pd.DataFrame(data=[4.5, np.NaN, 5.7, np.NaN],
index=list(map(pd.to_datetime, ['00:00', '01:00', '02:00', '03:00'])),
columns=['data'])
df2
frequent data data
2022-01-15 00:00:00 4.5 4.5
2022-01-15 01:00:00 5.7 NaN
2022-01-15 02:00:00 6.7 5.7
2022-01-15 03:00:00 9.0 NaN
new_df = pd.concat((df, df2), axis=1)
new_df
frequent data data
2022-01-15 00:00:00 4.5 4.5
2022-01-15 00:30:00 4.6 NaN
2022-01-15 01:00:00 5.7 NaN
2022-01-15 01:30:00 5.7 NaN
2022-01-15 02:00:00 6.7 5.7
2022-01-15 02:30:00 4.0 NaN
2022-01-15 03:00:00 9.0 NaN
I would like to achieve such a date frame
frequent data data
2022-01-15 00:00:00 4.5 4.5
2022-01-15 00:30:00 4.6 4.5
2022-01-15 01:00:00 5.7 NaN
2022-01-15 01:30:00 5.7 NaN
2022-01-15 02:00:00 6.7 5.7
2022-01-15 02:30:00 4.0 5.7
2022-01-15 03:00:00 9.0 NaN
Is there any easy way for this or do I need to write my function for this?
>Solution :
IIUC:
df2 = df2.reindex(df.index).groupby(lambda x: x.floor('H')).ffill()
new_df = pd.concat([df, df2], axis=1)
print(new_df)
# Output
frequent data data
2022-01-15 00:00:00 4.5 4.5
2022-01-15 00:30:00 4.6 4.5
2022-01-15 01:00:00 5.7 NaN
2022-01-15 01:30:00 5.7 NaN
2022-01-15 02:00:00 6.7 5.7
2022-01-15 02:30:00 4.0 5.7
2022-01-15 03:00:00 9.0 NaN
You can also fillna after concat:
new_df = pd.concat([df, df2], axis=1).groupby(lambda x: x.floor('H')).ffill()