Home I want to average only the last 5 rows of different groups contained in one column in a dataframe and make a new dataframe containing two columns

Questions

I want to average only the last 5 rows of different groups contained in one column in a dataframe and make a new dataframe containing two columns

byMR

December 14, 2021

I have a dataframe with four columns. In the column ‘Intensity’ there are 3 groups (0, 50, 100). I would like to average only the last 2 values of column Value over the 3 groups of column ‘Intensity’. Then I would like to make a new dataframe with the columns ‘Replication’, ‘Regime’, ‘Intensity’, ‘Value_mean’ and ‘Value_sd’, the last two being the calculate average and the standard deviation.

Replication   Regime   Intensity   Value
 1          Ctrl       0          2
 1          Ctrl       0          3
 1          Ctrl       0          4
 1          Ctrl       0          5
 1          Ctrl       0          6
 1          Ctrl       0          7
 1          Ctrl       50          1
 1          Ctrl       50          2
 1          Ctrl       50          2
 1          Ctrl       50          4
 1          Ctrl       50          6
 1          Ctrl       50          6
 1          Ctrl       100         2
 1          Ctrl       100         1
 1          Ctrl       100         0
 2          Ctrl       100         3
 2          Ctrl       0          7
 2          Ctrl       0          3
 2          Ctrl       0          6
 2          Ctrl       0          2
 2          Ctrl       0          1
 2          Ctrl       0          5
 2          Ctrl       50         12
 2          Ctrl       50         22
 2          Ctrl       50          52
 2          Ctrl       50          22
 2          Ctrl       50          2
 2          Ctrl       50          2
 2          Ctrl       100         22
 2          Ctrl       100         2
 2          Ctrl       100         25

So far I used the the function apply, but I don’t get a dataframe but a series

 df2 = df1.groupby(['Regime','Intensity']).apply(lambda x: 
       x.tail(3).mean(axis=0,level=0))

and I get

                                 Intensity      A
               Regime Intensity
               Ctrl      0          0               -0.87
                        50         50               2.08
                       100        100               4.84

>Solution :

Use DataFrame.tail in first step and then create new columns by GroupBy.transform:

df2 = df1.groupby(['Regime','Intensity']).tail(3).copy()
 
df2['mean_val'] = df2.groupby('Regime')['Value'].transform('mean')
df2['std_val'] = df2.groupby('Regime')['Value'].transform('std') 
print (df2)
    Replication Regime  Intensity  Value  mean_val    std_val
19            2   Ctrl          0      2  9.222222  10.425663
20            2   Ctrl          0      1  9.222222  10.425663
21            2   Ctrl          0      5  9.222222  10.425663
25            2   Ctrl         50     22  9.222222  10.425663
26            2   Ctrl         50      2  9.222222  10.425663
27            2   Ctrl         50      2  9.222222  10.425663
28            2   Ctrl        100     22  9.222222  10.425663
29            2   Ctrl        100      2  9.222222  10.425663
30            2   Ctrl        100     25  9.222222  10.425663