Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

In Pandas, how to retrieve the rows which created each group, after aggregation and filtering?

Let

import pandas as pd

df = pd.DataFrame(
    {
        'a': ['A', 'A', 'B', 'B', 'B', 'C'],
        'b': [True, True, True, False, False, True]
    }
)

print(df)

groups = df.groupby('a')  # "A", "B", "C"
agg_groups = groups.agg({'b':lambda x: all(x)}) # "A": True, "B": False, "C": True
agg_df = agg_groups.reset_index()
filtered_df = agg_df[agg_df["b"]]  # "A": True, "C": True

print(filtered_df)


# Now I want to get back the original df's rows, but only the remaining ones after group filtering


current output:

   a      b
0  A   True
1  A   True
2  B   True
3  B  False
4  B  False
5  C   True
   a     b
0  A  True
2  C  True

Required:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

   a      b
0  A   True
1  A   True
2  B   True
3  B  False
4  B  False
5  C   True
   a     b
0  A  True
2  C  True
   a      b
0  A   True
1  A   True
5  C   True

>Solution :

Use GroupBy.transform for get all Trues to mask with same size like original DataFrame, so possible use boolean indexing:

df1 = df[df.groupby('a')['b'].transform('all')]

#alternative
#f = lambda x: x.all()
#df1 = df[df.groupby('a')['b'].transform(f)]
print (df1)
   a     b
0  A  True
1  A  True
5  C  True

If want filter in aggregation function output is boolean Series and filter match indices mapped by original column a:

ids = df.groupby('a')['b'].all()

df1 = df[df.a.isin(ids.index[ids])]
print (df1)
   a     b
0  A  True
1  A  True
5  C  True

Your solution is similar with filter column b:

groups = df.groupby('a')
agg_groups = groups.agg({'b':lambda x: all(x)})

df1 = df[df.a.isin(agg_groups.index[agg_groups['b']])]
print (df1)
   a     b
0  A  True
1  A  True
5  C  True
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading