I have this example df:
info = {'name': ['Jason', 'Jason', 'Jason', 'Jason','Molly', 'Molly', 'Molly', 'Molly','Nicky', 'Nicky', 'Nicky', 'Nicky'],
'city': ['Las Vegas', 'New York', 'Dallas', 'Los Angeles','Las Vegas', 'New York', 'Dallas', 'Los Angeles','Las Vegas', 'New York', 'Dallas', 'Los Angeles'],
'Visits' :[2,2,2,2,1,3,4,1,2,8,2,8]}
df = pd.DataFrame(data=info)
df
gives:
name city Visits
0 Jason Las Vegas 2
1 Jason New York 2
2 Jason Dallas 2
3 Jason Los Angeles 2
4 Molly Las Vegas 1
5 Molly New York 3
6 Molly Dallas 4
7 Molly Los Angeles 1
8 Nicky Las Vegas 2
9 Nicky New York 8
10 Nicky Dallas 2
11 Nicky Los Angeles 8
I want to drop entries from names if all values under Visits are equivalent which is true in case of the df['name']== jason. I used drop_duplicates of two subsets name and Visits but the output drops also other duplicated values under other names.
df.drop_duplicates(['name','Visits'], keep=False)
This gives:
name city Visits
5 Molly New York 3
6 Molly Dallas 4
output should be:
4 Molly Las Vegas 1
5 Molly New York 3
6 Molly Dallas 4
7 Molly Los Angeles 1
8 Nicky Las Vegas 2
9 Nicky New York 8
10 Nicky Dallas 2
11 Nicky Los Angeles 8
What is the best approach to achieve this?
>Solution :
Use nunique:
df = df[df.groupby('name')['Visits'].transform('nunique').ne(1)]
Or
df = df.groupby('name').filter(lambda x: x['Visits'].nunique() != 1)
Output:
>>> df
name city Visits
4 Molly Las Vegas 1
5 Molly New York 3
6 Molly Dallas 4
7 Molly Los Angeles 1
8 Nicky Las Vegas 2
9 Nicky New York 8
10 Nicky Dallas 2
11 Nicky Los Angeles 8