Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create new variable based on change in another variable in Python

import pandas as pd

df = {'Date': ["2011-10-19", 
              "2013-01-14", 
              "2014-05-27",
              "2014-06-23",
              "2014-08-12",
              "2014-09-22",
              "2014-09-22",
               "2014-09-22"
             ], 'Status': ["Pending", 
                           "Pending", 
                           "Complete", 
                           "Pending",
                          "Complete",
                           "Pending", 
                           "Pending", 
                           "Pending"],
             'Group': ["a",
                       "a",
                       "a",
                       "a", 
                       "b",
                       "b",
                       "b",
                       "b"]}
df = pd.DataFrame(data=df)
df

I would like to create another variable based on the change in Status over time for each group such that they are considered a "completer" the next row after they have Status = "Complete"

For example I would like to create the "completer" column in the df2 table:

df2 = {'Date': ["2011-10-19", 
              "2013-01-14", 
              "2014-05-27",
              "2014-06-23",
              "2014-08-12",
              "2014-09-22",
              "2014-09-22",
               "2014-09-22"
             ], 'Status': ["Pending", 
                           "Pending", 
                           "Complete", 
                           "Pending",
                          "Complete",
                           "Pending", 
                           "Pending", 
                           "Pending"],
             'Group': ["a",
                       "a",
                       "a",
                       "a", 
                       "b",
                       "b",
                       "b",
                       "b"],
             'Completer': ["Non-Completer",
                          "Non-Completer",
                          "Non-Completer",
                          "Completer",
                          "Non-Completer",
                          "Completer",
                          "Completer",
                          "Completer"]}
df2 = pd.DataFrame(data=df2)
df2

Thanks!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

I was able to solve it in two steps.

Firstly I created a column which marks the first date that each group has "Completed", and added 1 to it so that we update values starting from the next row.

Secondly, since True / False translates to 1 / 0 respectively I used a groupby on Group and cummax, which would update all following rows per group to True.

Then lastly I just used replace and dropped the helping column.

df['first_date_per_group'] = df.index.isin(df.loc[df['Status'].eq('Complete')]['Group'].index + 1)

df = df.assign(Completer=df.groupby('Group')['first_date_per_group'].cummax()).replace(
    {True:'Completer',False:'Non-Completer'}).drop('first_date_per_group',axis=1)

prints:

         Date    Status Group      Completer
0  2011-10-19   Pending     a  Non-Completer
1  2013-01-14   Pending     a  Non-Completer
2  2014-05-27  Complete     a  Non-Completer
3  2014-06-23   Pending     a      Completer
4  2014-08-12  Complete     b  Non-Completer
5  2014-09-22   Pending     b      Completer
6  2014-09-22   Pending     b      Completer
7  2014-09-22   Pending     b      Completer
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading