I have a dataframe like the one below (just showing the first few rows for the example).
I want to resample the dataframe using the maximum of the ‘Depth’ column for every 24h, and selecting the rows associated with that maximum depth value.
Time Depth Temperature
0 20:15:45 08-Aug-2022 15.5 15.15
1 20:15:50 08-Aug-2022 19.0 13.15
2 20:15:55 08-Aug-2022 24.5 10.85
3 20:16:00 08-Aug-2022 30.0 8.00
4 20:16:05 08-Aug-2022 35.5 7.25
When using this code, it sample the max for each column:
tag_data2 = tag_data.resample('24h').max('Depth')
And gives me this output:
Depth Temperature
Time
2022-08-08 35.5 15.15
but I would like the output to be:
Depth Temperature
Time
2022-08-08 35.5 7.25
>Solution :
You should use resample.idxmax:
out = tag_data.loc[tag_data.resample('24h')['Depth'].idxmax()]
Output:
Depth Temperature
Time
2022-08-08 20:16:05 35.5 7.25
And if you want to update the index:
idx = tag_data.resample('24h')['Depth'].idxmax()
out = tag_data.loc[idx].set_axis(idx.index)
Output:
Depth Temperature
Time
2022-08-08 35.5 7.25
Or:
out = (tag_data
.resample('24h')['Depth'].idxmax().to_frame(name='max_time')
.merge(tag_data, left_on='max_time', right_index=True)
)
Output:
max_time Depth Temperature
Time
2022-08-08 2022-08-08 20:16:05 35.5 7.25