Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

I want to average only the last 5 rows of different groups contained in one column in a dataframe and make a new dataframe containing two columns

I have a dataframe with four columns. In the column ‘Intensity’ there are 3 groups (0, 50, 100). I would like to average only the last 2 values of column Value over the 3 groups of column ‘Intensity’. Then I would like to make a new dataframe with the columns ‘Replication’, ‘Regime’, ‘Intensity’, ‘Value_mean’ and ‘Value_sd’, the last two being the calculate average and the standard deviation.

Replication   Regime   Intensity   Value
 1          Ctrl       0          2
 1          Ctrl       0          3
 1          Ctrl       0          4
 1          Ctrl       0          5
 1          Ctrl       0          6
 1          Ctrl       0          7
 1          Ctrl       50          1
 1          Ctrl       50          2
 1          Ctrl       50          2
 1          Ctrl       50          4
 1          Ctrl       50          6
 1          Ctrl       50          6
 1          Ctrl       100         2
 1          Ctrl       100         1
 1          Ctrl       100         0
 2          Ctrl       100         3
 2          Ctrl       0          7
 2          Ctrl       0          3
 2          Ctrl       0          6
 2          Ctrl       0          2
 2          Ctrl       0          1
 2          Ctrl       0          5
 2          Ctrl       50         12
 2          Ctrl       50         22
 2          Ctrl       50          52
 2          Ctrl       50          22
 2          Ctrl       50          2
 2          Ctrl       50          2
 2          Ctrl       100         22
 2          Ctrl       100         2
 2          Ctrl       100         25

So far I used the the function apply, but I don’t get a dataframe but a series

 df2 = df1.groupby(['Regime','Intensity']).apply(lambda x: 
       x.tail(3).mean(axis=0,level=0))
 

and I get

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

                                 Intensity      A
               Regime Intensity
               Ctrl      0          0               -0.87
                        50         50               2.08
                       100        100               4.84
  

>Solution :

Use DataFrame.tail in first step and then create new columns by GroupBy.transform:

df2 = df1.groupby(['Regime','Intensity']).tail(3).copy()
 
df2['mean_val'] = df2.groupby('Regime')['Value'].transform('mean')
df2['std_val'] = df2.groupby('Regime')['Value'].transform('std') 
print (df2)
    Replication Regime  Intensity  Value  mean_val    std_val
19            2   Ctrl          0      2  9.222222  10.425663
20            2   Ctrl          0      1  9.222222  10.425663
21            2   Ctrl          0      5  9.222222  10.425663
25            2   Ctrl         50     22  9.222222  10.425663
26            2   Ctrl         50      2  9.222222  10.425663
27            2   Ctrl         50      2  9.222222  10.425663
28            2   Ctrl        100     22  9.222222  10.425663
29            2   Ctrl        100      2  9.222222  10.425663
30            2   Ctrl        100     25  9.222222  10.425663
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading