Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I remove outliers from a column in a dataframe?

The solutions I found online only show removing outliers from the entire dataframe, not just a specific column. So I’m having trouble figuring out how to perform outlier removal on a single column.

I tried creating a method, the code is shown below.

def find_outlier(df, column):
    # Find first and third quartile
    q1 = df[column].quantile(0.25)
    q3 = df[column].quantile(0.75)
    
    # Find interquartile range
    IQR = q3 - q1
    
    # Find lower and upper bound
    lower_bound = q1 - 1.5 * IQR
    upper_bound = q3 + 1.5 * IQR
    
    # Remove outliers
    df[column] = df[column][df[column] > lower_bound]
    df[column] = df[column][df[column] < upper_bound]
    
    return df

But when I ran the code, it said "Columns must be same length as key".

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The code I used to run is shown below.

df['no_of_trainings'] = find_outlier(df, 'no_of_trainings')

Any help is appreciated.

>Solution :

The comparison result is by-index, so you can use it to reduce the DataFrame

    df = df[df[column] > lower_bound]
    df = df[df[column] < upper_bound]
    return df

more concisely

    ...
    return df[(df[column] > lower_bound) & (df[column] < upper_bound)]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading