Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filter column for multiple values but only select the last one for one criteria

I have a dataframe similar to this one

df = pd.DataFrame({'date':[20220101,20220102,20220103,20220101,20220102,20220101], 'id':[1,1,1,2,2,3], 'value':[11,22,33,44,55,66], 'categorie':['a','a','c','a','c','c']})

       date  id  value categorie
   20220101   1     11         a
   20220102   1     22         a
   20220103   1     33         c
   20220101   2     44         a
   20220102   2     55         c
   20220101   3     66         c

I would now like to slice the df based on multiple values from column ‘categorie’ and am currently using

df = df[df['categorie'].isin(['a','c'])]

In addition to that I would like to be able to only get the [-1] row back for categorie ‘a’

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    date  id  value categorie
20220102   1     22         a
20220103   1     33         c
20220101   2     44         a
20220102   2     55         c
20220101   3     66         c

instead of

    date  id  value categorie
20220101   1     11         a
20220102   1     22         a 
20220103   1     33         c
20220101   2     44         a
20220102   2     55         c
20220101   3     66         c

I think the closest would be to think about it as a groupby max value on id and categorie but I am curious if there is a more pythonic way.

>Solution :

‘a’ and ‘c’ are the only categories in your data, if you just need the latest then drop the duplicates

# drop duplicates and keep the last
df.drop_duplicates(subset=['id','categorie'], keep='last')

or

# select the categories of 'a' and 'c' and drop the duplicates from among them
(df.loc[df['categorie'].isin(['a','c'])]
 .drop_duplicates(subset=['id','categorie'], keep='last'))
    date       id   value   categorie
1   20220102    1      22   a
2   20220103    1      33   c
3   20220101    2      44   a
4   20220102    2      55   c
5   20220101    3      66   c
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading