Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Dropping rows that fall below a certain percentage threshold of the total rows/sum [Python]

I am having an issue with filtering out the crimes – "OffenseDescription" – that fall below 5% (the specific or general solution would help so I can reproduce/adjust requirements as needed) of the total rows in the dataframe.

This is what I’ve tried so far, but it is crashing the kernel and is essentially running an infinite loop/execution.

I’m also doing this in VS Code, via a Jupyter Notebook.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

This is the code I’ve attempted so far:

  tot=crime.OffenseDescription.sum()  #Find sum of column 
  
  crime[crime.groupby(['OffenseDescriptiom']).transform(lambda x:
  (x.div(tot)*100)<0.05)]   #calculate percentage filter as per
  condition

Link to a screenshot of .head() of the dataframe I am using:

image

TIA

>Solution :

Use Series.value_counts with normalize for percentages and for remove groups bellow 0.05 filter mapped column greater or equal 0.05 in boolean indexing:

percentage = crime.OffenseDescription.value_counts(normalize=True) 

crime[crime['OffenseDescriptiom'].map(percentage) >= 0.05)] 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading