Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas : sum with lagged param

in sql request, i have a chance to sum with lagged param on the fly, like:

    CASE
       WHEN dt_rep.fcr = 0::numeric THEN 0::numeric
                    ELSE (dt_rep.fcr - lag(dt_rep.fcr) OVER (PARTITION BY dt_rep.place_id ORDER BY dt_rep.bucket)) / dt_rep.fcr
    END AS cr

here, I order the table by bucket (it is timeframe) and sorted them by place_id (to let group by place_id for lagging).
Then, i can simply filtered the table with values of cr is null what gives me required structure.

in pandas, I can shift and assign new column like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

data_fin['fcr_b'] = data_fin.groupby(['place_id'])['fcr'].shift(1)

then I need to combine columns ‘fcr_b’ and ‘fcr’ to new one ‘cr’ and filter dataframe where fcr_b is NaN.

Is there more accurate approache to get new column ‘cr’ something like:

data_fin[cr] = data_fin['fcr'].apply(lambda x: x + lag(x).groupby('place_id'))

<= of cause, it is not working example!

>Solution :

Something like this?

df = pd.DataFrame({"A": [1, 1, 1, 2, 2, 2, 3, 3, 3], 
                   "B": ["a", "b", "c", "d", "e", "f", "g", "h", "i"], 
                   "C": ['text', 'text', 'text', 'text', 'text', 'text', 'text', 'text', 'text']})

df['D'] = df.groupby('A', as_index=False)['B'].shift(1).add(df['C'], fill_value=0)
df
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading