Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Dropping column if more than half of the values are same – Python

I have pandas df which looks like the pic:
enter image description here

I want to delete any column if more than half of the values are the same in the column, and I dont know how to do this

I trid using :pandas.Series.value_counts
but with no luck

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can iterate over the columns, count the occurences of values as you tried with value counts and check if it is more than 50% of your column’s data.

n=len(df)
cols_to_drop=[]
for e in list(df.columns):
    max_occ=df['id'].value_counts().iloc[0] #Get occurences of most common value
    if 2*max_occ>n: # Check if it is more than half the len of the dataset
         cols_to_drop.append(e) 
df=df.drop(cols_to_drop,axis=1)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading