Home Pandas: how do I restart expanding function with each new day in my time-series?

Questions

Pandas: how do I restart expanding function with each new day in my time-series?

December 11, 2023

I am using expanding method in my dataset. I know how to use it, for example:

data["someColumn"].expanding().mean()

The challenge is that my dataset contains time series, and I need to "restart" expanding method when a new day starts. I.e. when new day starts expanding should treat first row of a new day as the only available data, then second row is second data etc till the day ends.

How can I achieve it?

>Solution :

Assuming your time series is being used as your dataframe’s index, you can create a new column with the indexes dates and then perform a df.groupby("date").expanding().mean():

import pandas as pd


df = pd.DataFrame(
    {"B": [1, 2, 4, 0, 4]},
    index=pd.to_datetime(
        ["2023-12-11 21:00:00", "2023-12-11 22:00:00", "2023-12-11 23:00:00",
         "2023-12-12 00:00:00", "2023-12-12 01:00:00"]
    )
)

df["day"] = df.index.to_series().dt.strftime("%Y-%m-%d")
df.groupby("day").expanding().mean()
# Returns:
#
#                                        B
# day                                     
# 2023-12-11 2023-12-11 21:00:00  1.000000
#            2023-12-11 22:00:00  1.500000
#            2023-12-11 23:00:00  2.333333
# 2023-12-12 2023-12-12 00:00:00  0.000000
#            2023-12-12 01:00:00  2.000000