Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to select first N number of groups based on values of a column conditionally and groupby two columns?

This is a follow up to this post

This is my DataFrame:

df = pd.DataFrame(
    {
        'a': [10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 10, 22],
        'b': [1, 1, 1, -1, -1, -1, -1, 2, 2, 2, 2, -1, -1, -1, -1],
        'c': [25, 25, 25, 45, 45, 45, 45, 65, 65, 65, 65, 40, 40, 30, 30],
        'main': ['x', 'x', 'x', 'x', 'x', 'x', 'x', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y']
    }
)

Expected output: Groupby main AND c:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    a  b   c main
0   10  1  25    x
1   15  1  25    x
2   20  1  25    x
3   25 -1  45    x
4   30 -1  45    x
5   35 -1  45    x
6   40 -1  45    x
11  65 -1  40    y
12  70 -1  40    y
13  10 -1  30    y
14  22 -1  30    y

The process is as follows: Note that groupby is done by TWO columns:

So for each main:

a) Selecting the group that all of the b values is 1. In my data and this df there is only one group with this condition.

b) Selecting first two groups (from top of df) that all of their b values are -1.

Note that there is a possibility in my data that there are no groups that has a or b condition. If that is the case, returning whatever matches the criteria is fine. For example the output could be only one group or no groups at all.

The groups that I want are shown below:

enter image description here

This is my attempt based on this answer but it appears that something else must change:

# identify groups with all 1
m1 = df['b'].eq(1).groupby(df['c', 'main']).transform('all')
# identify groups with all -1
m2 = df['b'].eq(-1).groupby(df['c', 'main']).transform('all')
# keep rows of first 2 groups with all -1
m3 = df[['c', 'main']].isin(df.loc[m2, ['c', 'main']].unique()[:2])

# select m1 OR m3
out = df[m1 | m3]

>Solution :

You can update the previous code to get the first 2 unique "c" per main:

groups = [df['c'], df['main']]
# identify groups with all 1
m1 = df['b'].eq(1).groupby(groups).transform('all')
# identify groups with all -1
m2 = df['b'].eq(-1).groupby(groups).transform('all')
# keep rows of first 2 groups with all -1, per main
keep = set.union(*df.loc[m2, ['c', 'main']].groupby('main')['c']
                    .agg(lambda x: set(x.unique()[:2])))
# {25}
m3 = df['c'].isin(keep)

# select m1 OR m3
out = df[m1 | m3]

Or using a merge, but this won’t necessarily keep the original order of the rows:

groups = [df['c'], df['main']]
# identify groups with all 1
m1 = df['b'].eq(1).groupby(groups).transform('all')
# identify groups with all -1
m2 = df['b'].eq(-1).groupby(groups).transform('all')
# keep rows of first 2 groups with all -1, per main
ref = df.loc[m2, ['c', 'main']].drop_duplicates().groupby('main').head(2)

out = pd.concat([df[m1], df.merge(ref)], ignore_index=True)

Output:

     a  b   c main
0   10  1  25    x
1   15  1  25    x
2   20  1  25    x
3   25 -1  45    x
4   30 -1  45    x
5   35 -1  45    x
6   40 -1  45    x
11  65 -1  40    y
12  70 -1  40    y
13  10 -1  30    y
14  22 -1  30    y
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading