Given a dataframe like this:
| Client | Priority |
|---|---|
| Client 1 | Low |
| Client 1 | Medium |
| Client 2 | Low |
| Client 2 | High |
| Client 3 | Low |
I want to group by client and select the highest priority (Low, Medium, High) for each client. It should look something like this:
| Client | Priority |
|---|---|
| Client 1 | Medium |
| Client 2 | High |
| Client 3 | Low |
>Solution :
convert priority into a categorical column before grouping:
dtype = pd.CategoricalDtype(['High', 'Medium', 'Low'], ordered = True)
df['Priority'] = df.Priority.astype(dtype)
df.sort_values(['Client', 'Priority']).groupby('Client', sort = False).head(1)
Client Priority
1 Client 1 Medium
3 Client 2 High
4 Client 3 Low
if you do not want to convert to category, you can map the positions instead:
(df.assign(temp = df.Priority.map({'Low':3,'Medium':2, 'High':1}))
.sort_values(['Client', 'temp'])
.drop(columns='temp')
.groupby('Client')
.head(1)
)
Client Priority
1 Client 1 Medium
3 Client 2 High
4 Client 3 Low