Home Filter column for multiple values but only select the last one for one criteria

Questions

Filter column for multiple values but only select the last one for one criteria

September 30, 2022

I have a dataframe similar to this one

df = pd.DataFrame({'date':[20220101,20220102,20220103,20220101,20220102,20220101], 'id':[1,1,1,2,2,3], 'value':[11,22,33,44,55,66], 'categorie':['a','a','c','a','c','c']})

       date  id  value categorie
   20220101   1     11         a
   20220102   1     22         a
   20220103   1     33         c
   20220101   2     44         a
   20220102   2     55         c
   20220101   3     66         c

I would now like to slice the df based on multiple values from column ‘categorie’ and am currently using

df = df[df['categorie'].isin(['a','c'])]

In addition to that I would like to be able to only get the [-1] row back for categorie ‘a’

    date  id  value categorie
20220102   1     22         a
20220103   1     33         c
20220101   2     44         a
20220102   2     55         c
20220101   3     66         c

instead of

    date  id  value categorie
20220101   1     11         a
20220102   1     22         a 
20220103   1     33         c
20220101   2     44         a
20220102   2     55         c
20220101   3     66         c

I think the closest would be to think about it as a groupby max value on id and categorie but I am curious if there is a more pythonic way.

>Solution :

‘a’ and ‘c’ are the only categories in your data, if you just need the latest then drop the duplicates

# drop duplicates and keep the last
df.drop_duplicates(subset=['id','categorie'], keep='last')

# select the categories of 'a' and 'c' and drop the duplicates from among them
(df.loc[df['categorie'].isin(['a','c'])]
 .drop_duplicates(subset=['id','categorie'], keep='last'))

    date       id   value   categorie
1   20220102    1      22   a
2   20220103    1      33   c
3   20220101    2      44   a
4   20220102    2      55   c
5   20220101    3      66   c