I’ve been racking my brain trying to figure out the best way to do this. I want to find the rolling sum of the previous 30 days but my ‘day’ column is not in datetime format.
Sample data
df = pd.DataFrame({'client': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B'],
'day': [319, 323, 336, 352, 379, 424, 461, 486, 496, 499, 303, 334, 346, 373, 374, 395, 401, 408, 458, 492],
'foo': [5.0, 2.0, np.nan, np.nan, np.nan, np.nan, np.nan, 7.0, np.nan, np.nan, 8.0, 7.0, 22.0, np.nan, 13.0, np.nan, np.nan, 5.0, 11.0, np.nan]}
>>> df
client day foo
0 A 319 5.0
1 A 323 2.0
2 A 336 NaN
3 A 352 NaN
4 A 379 NaN
5 A 424 NaN
6 A 461 NaN
7 A 486 7.0
8 A 496 NaN
9 A 499 NaN
10 B 303 8.0
11 B 334 7.0
12 B 346 22.0
13 B 373 NaN
14 B 374 13.0
15 B 395 NaN
16 B 401 NaN
17 B 408 5.0
18 B 458 11.0
19 B 492 NaN
I want a new column showing the rolling sum of ‘foo’ every 30 days.
So far I’ve tried:
df['foo_30day'] = df.groupby('client').rolling(30, on='day', min_periods=1)['foo'].sum().values
But it looks like it’s taking the rolling sum of the last 30 rows.
I was also thinking of maybe changing the ‘day’ column to a datetime format, then using rolling('30D') but I’m not sure how or even if that’s the best approach. I’ve also tried to use a groupby reindex to stretch the ‘day’ column and doing a simple rolling(30) but it’s not working for me.
Any advice would be greatly appreciated.
>Solution :
You can convert the days to datetime then use window='30D'. The days will be converted to datetime, with a date starting from the first possible date in Pandas (1970-01-01).
You can optionally specify the starting date with an origin attribute in to_datetime, but for your example that wouldn’t matter.
df = df.sort_values("day").reset_index(drop=True)
df["day_datetime"] = pd.to_datetime(df["day"], unit="D")
df["foo_30day"] = (
df.groupby("client")
.rolling("30D", on="day_datetime", min_periods=1)["foo"]
.sum()
.values
)
client day foo day_datetime foo_30day
0 B 303 8.0 1970-10-31 5.0
1 A 319 5.0 1970-11-16 7.0
2 A 323 2.0 1970-11-20 7.0
3 B 334 7.0 1970-12-01 2.0
4 A 336 NaN 1970-12-03 NaN
5 B 346 22.0 1970-12-13 NaN
6 A 352 NaN 1970-12-19 NaN
7 B 373 NaN 1971-01-09 7.0
8 B 374 13.0 1971-01-10 7.0
9 A 379 NaN 1971-01-15 7.0
10 B 395 NaN 1971-01-31 8.0
11 B 401 NaN 1971-02-06 7.0
12 B 408 5.0 1971-02-13 29.0
13 A 424 NaN 1971-03-01 22.0
14 B 458 11.0 1971-04-04 35.0
15 A 461 NaN 1971-04-07 13.0
16 A 486 7.0 1971-05-02 13.0
17 B 492 NaN 1971-05-08 5.0
18 A 496 NaN 1971-05-12 11.0
19 A 499 NaN 1971-05-15 NaN