Follow

Follow

Contact

Home Dropping column if more than half of the values are same – Python

Questions

Dropping column if more than half of the values are same – Python

byMR

April 5, 2022

I have pandas df which looks like the pic:
enter image description here

I want to delete any column if more than half of the values are the same in the column, and I dont know how to do this

I trid using :pandas.Series.value_counts
but with no luck

>Solution :

You can iterate over the columns, count the occurences of values as you tried with value counts and check if it is more than 50% of your column’s data.

n=len(df)
cols_to_drop=[]
for e in list(df.columns):
    max_occ=df['id'].value_counts().iloc[0] #Get occurences of most common value
    if 2*max_occ>n: # Check if it is more than half the len of the dataset
         cols_to_drop.append(e) 
df=df.drop(cols_to_drop,axis=1)

dataframe

byMR

Published April 05, 2022

Add a comment

Leave a ReplyCancel reply

Read more

Questions

Dropping column if more than half of the values are same – Python

byMR

April 5, 2022

Questions

PyQT6, no tooltip being displayed on button click

byMR

April 5, 2022

Questions

assign value from one vector to a group based on another vector in R

byMR

April 5, 2022

Questions

Property initialization without default value

byMR

April 5, 2022

Questions

Optimize Pandas assignation based on second dataframe

byMR

April 5, 2022

Questions

Python: Create a function that takes dimensions & scaling factor. Returns a two-dimensional array multiplication table scaled by the scaling factor

byMR

April 5, 2022