Getting strange output when using group by apply with np.select function

January 7, 2025

I am working with a Timeseries data wherein I am trying to perform outlier detection using IQR method.

Sample Data:

import pandas as pd
import numpy as np

df = pd.DataFrame({'datecol' : pd.date_range('2024-1-1', '2024-12-31'),
                   'val' : np.random.random.randin(low = 100, high = 5000, size = 8366})

my function:

def is_outlier(x):
    iqr = x.quantile(.75) - x.quantile(.25)
    outlier = (x <= x.quantile(.25) - 1.5*iqr) | (x >= x.quantile(.75) + 1.5*iqr)
    return np.select([outlier], [1], 0)

df.groupby(df['datecol'].dt.weekday)['val'].apply(is_outlier)

to which the output is something like below:

0    [1,1,0,0,....
1    [1,0,0,0,....
2    [1,1,0,0,....
3    [1,0,1,0,....
4    [1,1,0,0,....
5    [1,1,0,0,....
6    [1,0,0,1,....

I am expecting a single series as output which I can add back to the original dataframe as a flag column.

Can someone please help me with this

>Solution :

You should use groupby.transform, not apply:

df['flag'] = df.groupby(df['datecol'].dt.weekday)['val'].transform(is_outlier)

Alternatively, explicitly return a Series and use group_keys=False:

def is_outlier(x):
    iqr = x.quantile(.75) - x.quantile(.25)
    outlier = (x <= x.quantile(.25) - 1.5*iqr) | (x >= x.quantile(.75) + 1.5*iqr)
    return pd.Series(np.where(outlier, 1, 0), index=x.index)

df['flag'] = (df.groupby(df['datecol'].dt.weekday, group_keys=False)
              ['val'].apply(is_outlier)
              )

Note that with a single condition, np.where should be preferred to np.select.

You could also use a vectorial approach with groupby.quantile:

wd = df['datecol'].dt.weekday
g = df.groupby(wd)['val']
q25 = g.quantile(.25)
q75 = g.quantile(.75)
iqr = wd.map(q75-q25)
df['flag'] = 1 - df['val'].between(wd.map(q25) - 1.5*iqr, wd.map(q75) + 1.5*iqr)

Output:

       datecol   val  flag
0   2024-01-01  3193     0
1   2024-01-02  1044     0
2   2024-01-03  2963     0
3   2024-01-04  4448     0
4   2024-01-05  1286     0
..         ...   ...   ...
361 2024-12-27  1531     0
362 2024-12-28  4565     0
363 2024-12-29  3396     0
364 2024-12-30  1870     0
365 2024-12-31  3818     0