Home pandas groupby and do categorical ordering to drop duplicates

Questions

pandas groupby and do categorical ordering to drop duplicates

January 2, 2023

I have a dataframe like as below

df = pd.DataFrame({
        "Name": ["Tim", "Tim", "Tim", "Tim", "Tim",'Jack','Jack','Jack'],
        "Status": ["A1", "E1", "B3", "D4", "C90","A1","C90","B3"]
})

The actual order of my status variable is B3 < A1 < D4 < C90 < E1.

So the last value is E1 and 1st value is B3.

I would like to do the below

a) groupby Name

a) order the values based on categorical ordering shown above

c) Keep only the last value (after dropping duplicates based on Name column)

So, I tried the below

df["Status"] = df["Status"].astype("category")
df["Status"] = df["Status"].cat.set_categories(["B3", "A1", "D4", "C90", "E90"], ordered=True)
df = df.sort_values(['Status'])
df_cleaned = df.drop_duplicates(['Status'],keep='last')

But this results in incorrect output.

I expect my output to be like as below (one row for each Name and their latest/last Status value)

Name   Status
Tim     E1
Jack    C90

>Solution :

Add existing categories to list and sorting with remove duplicates by Name column:

df["Status"] = pd.Categorical(df["Status"], 
                              categories=["B3", "A1", "D4", "C90", "E90","E1"], 
                              ordered=True)

df_cleaned = (df.sort_values(['Status'])
                .drop_duplicates(['Name'],keep='last')

print (df_cleaned)
   Name Status
6  Jack    C90
1   Tim     E1

If possible some values are not in list of categories also remove missing values:

df_cleaned = (df.dropna(subset=['Status'])
                .sort_values(['Status'])
                .drop_duplicates(['Name'],keep='last')

group-by

byMR

Published January 02, 2023

Add a comment

Next.js, interacting with a database through API routes vs doing it directly

byMR

January 2, 2023

Questions

Dax Studio – The syntax for Calculate is incorrect

byMR

January 2, 2023

Questions

Python pandas df index error when column empty

byMR

January 2, 2023

Questions

Changing a <p> element's innerText

byMR

January 2, 2023

Questions

Firebase cloud messaging notif wont stack together

byMR

January 2, 2023

Questions

React render props using map

byMR

January 2, 2023

pandas groupby and do categorical ordering to drop duplicates

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Next.js, interacting with a database through API routes vs doing it directly

Dax Studio – The syntax for Calculate is incorrect

Python pandas df index error when column empty

Changing a <p> element's innerText

Firebase cloud messaging notif wont stack together

React render props using map

Keep Up to Date with the Most Important News

pandas groupby and do categorical ordering to drop duplicates

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Next.js, interacting with a database through API routes vs doing it directly

Dax Studio – The syntax for Calculate is incorrect

Python pandas df index error when column empty

Changing a <p> element's innerText

Firebase cloud messaging notif wont stack together

React render props using map

Discover more from Dev solutions