I am trying write a code that can read from a dataframe that has thousands of rows like:
name value
abc123 wrd
abc123 wrd
abc123 wrd
abc987 wrd
abc987 wrd
abc987 bbs
.. ..
I want the code to show me the name that has only the value wrd, for example in this case after the code works it should only show the result as abc123, since abc987 has both wrd and bss as value.
What I did so far is:
import pandas as pd
result = df[df['value'].isin(['wrd', 'bbs'])]
this results in both abc123 and abc987
result = df[df['value'].isin(['wrd'])]
this results in again both abc123 and abc987
What should I use instead to get name of the ones that only has the value as wrd?
>Solution :
Fitler out group if not equal wrd with test not equal scalar by Series.ne and not matched names by Series.isin:
result = df[~df['name'].isin(df.loc[df['value'].ne('wrd'), 'name'])]
print (result)
name value
0 abc123 wrd
1 abc123 wrd
2 abc123 wrd
Details:
print (df.loc[df['value'].ne('wrd'), 'name'])
5 abc987
Name: name, dtype: object()
print (~df['name'].isin(df.loc[df['value'].ne('wrd'), 'name']))
0 True
1 True
2 True
3 False
4 False
5 False
Name: name, dtype: bool()
Or use GroupBy.transform with GroupBy.all for test if match all values:
result = df[df['value'].eq('wrd').groupby(df['name']).transform('all')]
print (result)
name value
0 abc123 wrd
1 abc123 wrd
2 abc123 wrd
print (df['value'].eq('wrd').groupby(df['name']).transform('all'))
0 True
1 True
2 True
3 False
4 False
5 False
Name: name, dtype: bool