Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Better way to show duplicates in Pandas

dups_df = df.pivot_table(columns=['DstAddr'], aggfunc='size')
print (dups_df )

I am using this code block to show the duplicates but I would like to see the output in order(most used one) and maybe with a better visualization. How can I do this?

enter image description here

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can use the duplicated method, as show above:

print(df[df.duplicated(subset='DstAddr')]

You can see the whole documentation at https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.duplicated.html

Other way is value_counts method, as show above too:

print(df.value_counts(subset='DstAddr', ascending=False))

Documentation at https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.value_counts.html

To vizualize this, I you can you value_counts and add a plot method.

df.value_counts(subset='DstAddr', ascending=False).plot()

Documentation at https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading