Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python – pandas, group by and max count

I need the most similar (max count) from column cluster-1 from column cluster-2.

Input – data

Input data

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Output – data

output

I use the command: df.groupby([‘cluster-1′,’cluster-2’])[‘cluster-2’].count() this command will give me count per occurrence in the column cluster-2. I need advice on how to proceed, thanks.

>Solution :

Use SeriesGroupBy.value_counts because by default sorted values, so possible convert MultiIndex to DataFrame by MultiIndex.to_frame and then remove duplicates by cluster-1 in DataFrame.drop_duplicates:

df1 = (df.groupby(['cluster-1'])['cluster-2']
         .value_counts()
         .index
         .to_frame(index=False)
         .drop_duplicates('cluster-1'))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading