Home Removing outliers in a df containing mixed dtype

Questions

Removing outliers in a df containing mixed dtype

April 28, 2022

I am working on a pandas DataFrame containing numerical columns as well as string columns (dtype is object), and would like to remove the rows containing outliers with respect to the distributions within a column. In other words, detect the outliers in each column and drop the corresponding rows.

I have found two solutions to this, but neither takes into account that my df does not contain only numbers, hence they both result in errors (when encountering strings, I assume).

Way 1:

from scipy import stats
df[(np.abs(stats.zscore(df)) < 3).all(axis=1)]

returns TypeError: unsupported operand type(s) for /: 'str' and 'int'.
This is why I guess the error arises from the df having mixed dtypes.

Way 2:

for col in df.columns:
    lower = df[col].quantile(0.05)
    upper = df[col].quantile(0.95)
    df = df[col].clip(lower=lower, upper=upper)

returns KeyError with this traceback:

File omissis, in Class.remove_outliers(self, df)
    423 def remove_outliers(self, df):
    424     for col in df.columns:
--> 425         lower = df[col].quantile(0.05)
    426         upper = df[col].quantile(0.95)
    427         df = df[col].clip(lower=lower, upper=upper)

File omissis, in Series.__getitem__(self, key)
    955     return self._values[key]
    957 elif key_is_scalar:
--> 958     return self._get_value(key)
    960 if is_hashable(key):
    961     # Otherwise index.get_value will raise InvalidIndexError
    962     try:
    963         # For labels that don't resolve as scalars like tuples and frozensets

File omissis, in Series._get_value(self, label, takeable)
   1066     return self._values[label]
   1068 # Similar to Index.get_value, but we do not fall back to positional
-> 1069 loc = self.index.get_loc(label)
   1070 return self.index._get_values_for_loc(self, loc, label)

File omissis, in RangeIndex.get_loc(self, key, method, tolerance)
    387             raise KeyError(key) from err
    388     self._check_indexing_error(key)
--> 389     raise KeyError(key)
    390 return super().get_loc(key, method=method, tolerance=tolerance)

KeyError: 'colname'

How would you solve this?

EDIT: the idea is to skip the non numeric columns, to ignore them.

>Solution :

I would break the problem into stages:

Firstly, identify (numeric) columns you want to do the outlier removal.
Reference

newdf = df.select_dtypes(include=np.number)

Now perform whatever filtering/outlier removal you want on the rows of newdf. Afterwards, newdf should contain only rows you wish to retain.

Then keep only the rows of df those index are in newdf. Reference

df = df[df.index.isin(newdf.index)]

dtype

byMR

Published April 28, 2022

Add a comment

How to delete the items from confirmation popup when we click on button

byMR

April 28, 2022

Questions

Pygame character movement

byMR

April 28, 2022

Questions

How to change the color of the certain element when there are multiple element with the same class?

byMR

April 28, 2022

Questions

Selenium not finding elements on specific website

byMR

April 28, 2022

Questions

Why I have already added a .dll to unity, but still can't use it/

byMR

April 28, 2022

Questions

How do i solve this error : "error":"Bad Request" in imgur api flutter

byMR

April 28, 2022

Removing outliers in a df containing mixed dtype

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

How to delete the items from confirmation popup when we click on button

Pygame character movement

How to change the color of the certain element when there are multiple element with the same class?

Selenium not finding elements on specific website

Why I have already added a .dll to unity, but still can't use it/

How do i solve this error : "error":"Bad Request" in imgur api flutter

Keep Up to Date with the Most Important News

Removing outliers in a df containing mixed dtype

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

How to delete the items from confirmation popup when we click on button

Pygame character movement

How to change the color of the certain element when there are multiple element with the same class?

Selenium not finding elements on specific website

Why I have already added a .dll to unity, but still can't use it/

How do i solve this error : "error":"Bad Request" in imgur api flutter

Discover more from Dev solutions