I am using expanding method in my dataset. I know how to use it, for example:
data["someColumn"].expanding().mean()
The challenge is that my dataset contains time series, and I need to "restart" expanding method when a new day starts. I.e. when new day starts expanding should treat first row of a new day as the only available data, then second row is second data etc till the day ends.
How can I achieve it?
>Solution :
Assuming your time series is being used as your dataframe’s index, you can create a new column with the indexes dates and then perform a df.groupby("date").expanding().mean():
import pandas as pd
df = pd.DataFrame(
{"B": [1, 2, 4, 0, 4]},
index=pd.to_datetime(
["2023-12-11 21:00:00", "2023-12-11 22:00:00", "2023-12-11 23:00:00",
"2023-12-12 00:00:00", "2023-12-12 01:00:00"]
)
)
df["day"] = df.index.to_series().dt.strftime("%Y-%m-%d")
df.groupby("day").expanding().mean()
# Returns:
#
# B
# day
# 2023-12-11 2023-12-11 21:00:00 1.000000
# 2023-12-11 22:00:00 1.500000
# 2023-12-11 23:00:00 2.333333
# 2023-12-12 2023-12-12 00:00:00 0.000000
# 2023-12-12 01:00:00 2.000000