Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to create a pandas lagged variable column based on a condition

I have a data frame where each row summarises the monthly activity of forum members, including the number of posts, the number of replies, the number of hateful posts and the number of hateful replies they receive.

I wish to create a lagged column (n-1) for the number of hateful replies that each user receives. My data frame looks something like this:

post_month user_name thread_posts hate_posts replies hateful_replies
2018-01 !ronman 11 4 19 6
2021-06 $$$ 43 11 138 39
2021-07 $$$ 51 16 169 57
2020-06 $toneman 111 37 1 1
2020-07 $toneman 141 33 23 9
2020-08 $toneman 65 14 0 0
2021-08 Farnham 13 0 4 0

However, doing:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df['hateful_replies-1']=df['hateful_replies'].shift(1)

Does not result in the desired outcome as it doesn’t consider user_name. I wish to shift rows conditional on user_name so that for each username, each month, the previous months hateful_replies value is added to the new column. In instances where there is only one month (one row) per username I would populate the lagged column as a zero, and same for the first month that each username appears since there is no previous month’s value to lag.

Hoping to achieve something like this:

post_month user_name thread_posts hate_posts replies hateful_replies lagged_hateful_replies
2018-01 !ronman 11 4 19 6 0
2021-06 $$$ 43 11 138 39 0
2021-07 $$$ 51 16 169 57 39
2020-06 $toneman 111 37 1 1 0
2020-07 $toneman 141 33 23 9 1
2020-08 $toneman 65 14 0 0 9
2021-08 Farnham 13 0 4 0 0

>Solution :

Using pd.shift will work fine in your case. Remember to group by user_name and set fill_value to 0.

df['lagged_hateful_replies'] = df.groupby('user_name')['hateful_replies'].shift(1, fill_value=0)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading