Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I get the first element in a rolling window on Pandas data frame

I have the following data structure:
enter image description here

and what I’d like to do is for each location, date to determine what was the spending exactly n days ago.

I am trying with a rolling window here but for some reason, when I create the window, I don’t know how to get the first element in it:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

dummy_data["spending_3_days_ago"] = (
        dummy_data.sort_values("date")
        .groupby("location")["spendings"]
        .transform(lambda x: x.rolling(3, min_periods=1).??first??())
    ) 

I can call all type of aggregate functions on the window such as: sum, mean etc. but I can not get the first.

>Solution :

Don’t use a rolling but rather a groupby.shift:

dummy_data["spending_3_days_ago"] = (
        dummy_data.sort_values("date")
        .groupby("location")["spendings"]
        .shift(3)
    )

Output:

   location        date  spendings  spending_3_days_ago
0      1122  2017-12-01     451.12                  NaN
1      1122  2017-12-02     542.43                  NaN
2      1122  2017-12-03     512.23                  NaN
3      1122  2017-12-04     821.23               451.12
4      2233  2017-12-01     892.31                  NaN
5      2233  2017-12-02     303.32                  NaN
6      2233  2017-12-03     673.32                  NaN
7      2233  2017-12-04    1238.32               892.31

Or, if you want to simulate min_periods=1, backpropagate the values:

dummy_data["spending_3_days_ago"] = (
        dummy_data.sort_values("date")
        .groupby("location")["spendings"]
        .apply(lambda g: g.shift(3).bfill())
    )

Output:

   location        date  spendings  spending_3_days_ago
0      1122  2017-12-01     451.12               451.12
1      1122  2017-12-02     542.43               451.12
2      1122  2017-12-03     512.23               451.12
3      1122  2017-12-04     821.23               451.12
4      2233  2017-12-01     892.31               892.31
5      2233  2017-12-02     303.32               892.31
6      2233  2017-12-03     673.32               892.31
7      2233  2017-12-04    1238.32               892.31
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading