Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Drop rows entries if a column values are all equal in grouped subset

I have this example df:

info = {'name': ['Jason', 'Jason', 'Jason', 'Jason','Molly', 'Molly', 'Molly', 'Molly','Nicky', 'Nicky', 'Nicky', 'Nicky'], 
'city': ['Las Vegas', 'New York', 'Dallas', 'Los Angeles','Las Vegas', 'New York', 'Dallas', 'Los Angeles','Las Vegas', 'New York', 'Dallas', 'Los Angeles'],
'Visits' :[2,2,2,2,1,3,4,1,2,8,2,8]}
df = pd.DataFrame(data=info)
df

gives:

    name    city        Visits
0   Jason   Las Vegas     2
1   Jason   New York      2
2   Jason   Dallas        2
3   Jason   Los Angeles   2
4   Molly   Las Vegas     1
5   Molly   New York      3
6   Molly   Dallas        4
7   Molly   Los Angeles   1
8   Nicky   Las Vegas     2
9   Nicky   New York      8
10  Nicky   Dallas        2
11  Nicky   Los Angeles   8

I want to drop entries from names if all values under Visits are equivalent which is true in case of the df['name']== jason. I used drop_duplicates of two subsets name and Visits but the output drops also other duplicated values under other names.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df.drop_duplicates(['name','Visits'], keep=False)

This gives:

    name    city     Visits
5   Molly   New York    3
6   Molly   Dallas      4

output should be:

4   Molly   Las Vegas     1
5   Molly   New York      3
6   Molly   Dallas        4
7   Molly   Los Angeles   1
8   Nicky   Las Vegas     2
9   Nicky   New York      8
10  Nicky   Dallas        2
11  Nicky   Los Angeles   8

What is the best approach to achieve this?

>Solution :

Use nunique:

df = df[df.groupby('name')['Visits'].transform('nunique').ne(1)]

Or

df = df.groupby('name').filter(lambda x: x['Visits'].nunique() != 1)

Output:

>>> df
     name         city  Visits
4   Molly    Las Vegas       1
5   Molly     New York       3
6   Molly       Dallas       4
7   Molly  Los Angeles       1
8   Nicky    Las Vegas       2
9   Nicky     New York       8
10  Nicky       Dallas       2
11  Nicky  Los Angeles       8
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading