Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas – groupby a column, apply a function to create a new column – giving incorrect results

I have data that looks like this:

   Thing   quarter    num_col1    num_col2
    aaa    2010Q1      1.3         99.76
    bbb    2010Q1      11.3        109.76
    ccc    2010Q1      91.3        119.76
   .....
   .....
    aaa    2019Q4      21.3        119.76
    bbb    2019Q4      41.3        299.76
    ccc    2019Q4      201.3       199.76

I need to group by Thing column, and calculate moving-average for columns num_col1 and num_col2 for all the quarters.

Here’s what I have tried till now:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

## define moving-average function
N = 2
def pandas_rolling(x):
    return pd.Series.rolling(x, window=N).mean()

## now group-by and calculate moving averages
things_groupby = df.groupby(by=['Thing'])
## below lines are giving incorrect values
df.loc[:,'num_col1_SMA'] = (things_groupby['num_col1'].apply(pandas_rolling)).values
df.loc[:,'num_col2_SMA'] = (things_groupby['num_col2'].apply(pandas_rolling)).values

But, when I do for one of the unique thing from Thing column manually like shown below, it gives expected results.

pandas_rolling(df.loc[df.loc[:,'Topic']=='aaa'].loc[:,'num_col1']).values

what am I doing wrong with calculating the moving-average for individual group and then populating them in the dataframe? How do I do this properly?

>Solution :

You can remove values:

df['num_col1_SMA'] = things_groupby['num_col1'].apply(pandas_rolling)
df['num_col2_SMA'] = things_groupby['num_col2'].apply(pandas_rolling)

Or:

df[['num_col1_SMA', 'num_col2_SMA']] = (things_groupby[['num_col1','num_col2']]
                                               .apply(pandas_rolling))

If possible without groupby.apply is necessary remove first level of MultiIndex:

df[['num_col1_SMA', 'num_col2_SMA']] = (things_groupby[['num_col1','num_col2']]
                                               .rolling(window=N)
                                               .mean()
                                               .droplevel(0))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading