Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I filter a Pandas DataFrame based on whether all aggregated values in a column are True?

I have the following data

data = [
    [1, True],
    [1, True],
    [1, True],
    [1, True],
    [2, True],
    [2, False],
    [2, True],
    [3, True],
    [3, True],
    [3, True],
    [3, True],
    [4, True],
    [4, True],
    [4, False],
    [5, True],
    [5, True],
    [5, True],    
    [5, True],
]

df = pd.DataFrame(data, columns=['ids', 'accept'])

And I would like to filter out all rows whose IDs that have at least one False value in the accept column. So my result should look like (note the missing 2 and 4 IDs):

   ids accept
0   1   True
1   1   True
2   1   True
3   1   True
4   3   True
5   3   True
6   3   True
7   3   True
8   5   True
9   5   True
10  5   True
11  5   True

I was able to get a list of the IDs for which all values in the accept column are True using the groupby() and all() methods:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

# Use the groupby() method to group the DataFrame by the 'ids' column
grouped = df.groupby('ids')

# Use the all() method to check whether all values in the 'accept' column are True for each group
accept_all_true = grouped['accept'].all()

But I am stuck at this point. How can I apply this grouping to my original data frame?

>Solution :

If performance is important you need avoid DataFrameGroupBy.filter because bad performance in larger DataFrame:

First idea is filter ids for at least one False and then fiter ouput this ids:

df1 = df[~df['ids'].isin(df.loc[~df['accept'],'ids'])]
print (df1)
    ids  accept
0     1    True
1     1    True
2     1    True
3     1    True
7     3    True
8     3    True
9     3    True
10    3    True
14    5    True
15    5    True
16    5    True
17    5    True

Or you can use GroupBy.transform for test if all Trues per groups, performance should be slowier:

df1 = df[df.groupby('ids')['accept'].transform('all').eq(df['accept'])]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading