pandas : sum with lagged param

June 11, 2024

in sql request, i have a chance to sum with lagged param on the fly, like:

    CASE
       WHEN dt_rep.fcr = 0::numeric THEN 0::numeric
                    ELSE (dt_rep.fcr - lag(dt_rep.fcr) OVER (PARTITION BY dt_rep.place_id ORDER BY dt_rep.bucket)) / dt_rep.fcr
    END AS cr

here, I order the table by bucket (it is timeframe) and sorted them by place_id (to let group by place_id for lagging).
Then, i can simply filtered the table with values of cr is null what gives me required structure.

in pandas, I can shift and assign new column like:

data_fin['fcr_b'] = data_fin.groupby(['place_id'])['fcr'].shift(1)

then I need to combine columns ‘fcr_b’ and ‘fcr’ to new one ‘cr’ and filter dataframe where fcr_b is NaN.

Is there more accurate approache to get new column ‘cr’ something like:

data_fin[cr] = data_fin['fcr'].apply(lambda x: x + lag(x).groupby('place_id'))

<= of cause, it is not working example!

>Solution :

Something like this?

df = pd.DataFrame({"A": [1, 1, 1, 2, 2, 2, 3, 3, 3], 
                   "B": ["a", "b", "c", "d", "e", "f", "g", "h", "i"], 
                   "C": ['text', 'text', 'text', 'text', 'text', 'text', 'text', 'text', 'text']})

df['D'] = df.groupby('A', as_index=False)['B'].shift(1).add(df['C'], fill_value=0)
df