Home pandas – groupby a column, apply a function to create a new column – giving incorrect results

Questions

pandas – groupby a column, apply a function to create a new column – giving incorrect results

December 8, 2021

I have data that looks like this:

   Thing   quarter    num_col1    num_col2
    aaa    2010Q1      1.3         99.76
    bbb    2010Q1      11.3        109.76
    ccc    2010Q1      91.3        119.76
   .....
   .....
    aaa    2019Q4      21.3        119.76
    bbb    2019Q4      41.3        299.76
    ccc    2019Q4      201.3       199.76

I need to group by Thing column, and calculate moving-average for columns num_col1 and num_col2 for all the quarters.

Here’s what I have tried till now:

## define moving-average function
N = 2
def pandas_rolling(x):
    return pd.Series.rolling(x, window=N).mean()

## now group-by and calculate moving averages
things_groupby = df.groupby(by=['Thing'])
## below lines are giving incorrect values
df.loc[:,'num_col1_SMA'] = (things_groupby['num_col1'].apply(pandas_rolling)).values
df.loc[:,'num_col2_SMA'] = (things_groupby['num_col2'].apply(pandas_rolling)).values

But, when I do for one of the unique thing from Thing column manually like shown below, it gives expected results.

pandas_rolling(df.loc[df.loc[:,'Topic']=='aaa'].loc[:,'num_col1']).values

what am I doing wrong with calculating the moving-average for individual group and then populating them in the dataframe? How do I do this properly?

>Solution :

You can remove values:

df['num_col1_SMA'] = things_groupby['num_col1'].apply(pandas_rolling)
df['num_col2_SMA'] = things_groupby['num_col2'].apply(pandas_rolling)

Or:

df[['num_col1_SMA', 'num_col2_SMA']] = (things_groupby[['num_col1','num_col2']]
                                               .apply(pandas_rolling))

If possible without groupby.apply is necessary remove first level of MultiIndex:

df[['num_col1_SMA', 'num_col2_SMA']] = (things_groupby[['num_col1','num_col2']]
                                               .rolling(window=N)
                                               .mean()
                                               .droplevel(0))