Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Find mode for each group in pandas DataFrame

I have a dataframe with the following 3 columns: property, value, count

Pairs of (property, value) are unique.

I need to add a new column to this dataframe, that will check if the value in count is the most frequent within its property column value. So, for example:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

We have a dataframe

property,value,count
p1,v1,20
p1,v2,50
p1,v3,50
p2,v4,10
p2,v5,20

And as a result we need to get:

property,value,count,is_mode
p1,v1,20,False
p1,v2,50,True
p1,v3,50,True
p2,v4,10,True
p2,v5,20,False 
  • in case when there are multiple modes I need to use the minimum one for calculating the is_mode value. So here, for p2 there are modes – 10, 20, but since I need the minimum one, for 10 is_mode will be True and for 20 it will be False

I’ve found numerous solutions from stackoverflow already, but these didn’t lead me to any good result.
For some reason

I tried to split the task into first getting the mode for each property and then somehow compare the values to the minimum mode for this group:

new_df = df.groupby(["property"])["count"].agg(pd.Series.mode)

same if I try to attach it to the current dataframe:

properties["mode"] = properties.groupby(["property"])["count"].transform(pd.Series.mode)

these doesn’t crush however they produce weird results with lots of Nans. For example, I have:

property,value,count
p1,v1,200
p1,v2,60
p1,v3,60

and expect to get:

property,value,count,mode
p1,v1,200,60
p1,v2,60,60
p1,v3,60,60

but in my case I get:

property,value,count,mode
p1,v1,200,NaN
p1,v2,60,NaN
p1,v3,60,NaN

>Solution :

Assuming you want the smallest mode of the counts per group, use a custom function and groupby.transform.

For each group, get the mode (one or more values), then the min, flag values that are equal to it:

df['is_mode'] = (df.groupby('property')['count']
                   .transform(lambda s: s.eq(s.mode().min()))
                )

Output:

  property value  count  is_mode
0       p1    v1     20    False
1       p1    v2     50     True
2       p1    v3     50     True
3       p2    v4     10     True
4       p2    v5     20    False
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading