Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas groupby and do categorical ordering to drop duplicates

I have a dataframe like as below

df = pd.DataFrame({
        "Name": ["Tim", "Tim", "Tim", "Tim", "Tim",'Jack','Jack','Jack'],
        "Status": ["A1", "E1", "B3", "D4", "C90","A1","C90","B3"]
})

The actual order of my status variable is B3 < A1 < D4 < C90 < E1.

So the last value is E1 and 1st value is B3.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I would like to do the below

a) groupby Name

a) order the values based on categorical ordering shown above

c) Keep only the last value (after dropping duplicates based on Name column)

So, I tried the below

df["Status"] = df["Status"].astype("category")
df["Status"] = df["Status"].cat.set_categories(["B3", "A1", "D4", "C90", "E90"], ordered=True)
df = df.sort_values(['Status'])
df_cleaned = df.drop_duplicates(['Status'],keep='last')

But this results in incorrect output.

I expect my output to be like as below (one row for each Name and their latest/last Status value)

Name   Status
Tim     E1
Jack    C90

>Solution :

Add existing categories to list and sorting with remove duplicates by Name column:

df["Status"] = pd.Categorical(df["Status"], 
                              categories=["B3", "A1", "D4", "C90", "E90","E1"], 
                              ordered=True)

df_cleaned = (df.sort_values(['Status'])
                .drop_duplicates(['Name'],keep='last')

print (df_cleaned)
   Name Status
6  Jack    C90
1   Tim     E1

If possible some values are not in list of categories also remove missing values:

df_cleaned = (df.dropna(subset=['Status'])
                .sort_values(['Status'])
                .drop_duplicates(['Name'],keep='last')
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading