Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas Dataframe Multiindex – Calculate Mean and add additional column to each level of the index

Given the following dataframe:

Year     2024          2023          2022
Header   N Result SD   N Result SD   N Result SD
Vendor
A        5 20     3    5 22     4    1 21     3
B        4 25     2    4 25     3    4 26     5
C        9 22     3    9 27     1    3 23     3   
D        3 23     5    3 16     2    5 13     4 
E        5 27     2    5 21     3    3 19     5

I would like to calculate for each year the mean value of the results column and then create a column, where the relative deviation to the mean is displayed (e.g. Results Value / mean-value * 100). The N and SD column were just included for completeness and is not needed for the calculation.

Year     2024                    2023                      2022
Header   N Result SD Deviation   N Result SD Deviation     N Result SD Deviation
Vendor
A        5 20     3  85.5        5 22     4  99.1          1 21     3  ..
B        4 25     2  106         4 25     3  113           4 26     5  ..
C        9 22     3  ..          9 27     1  ..            3 23     3  .. 
D        3 23     5  ..          3 16     2  ..            5 13     4  ..
E        5 27     2  ..          5 21     3  ..            3 19     5  ..

How what i be able to achieve that?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Thanks a lot in advance!

>Solution :

Use DataFrame.xs for select Result labels in MultiIndex, divide by mean and append to original in concat, last for correct position add DataFrame.sort_index with parameter sort_remaining=False:

df1 = df.xs('Result', axis=1, level=1, drop_level=False)

out = (pd.concat([df, 
                 df1.div(df1.mean()).mul(100)
                    .rename(columns={'Result':'Deviation'})], axis=1)
         .sort_index(axis=1, ascending=False, level=0, sort_remaining=False))
print (out)
  2024                       2023                       2022            \
     N Result SD   Deviation    N Result SD   Deviation    N Result SD   
A    5     20  3   85.470085    5     22  4   99.099099    1     21  3   
B    4     25  2  106.837607    4     25  3  112.612613    4     26  5   
C    9     22  3   94.017094    9     27  1  121.621622    3     23  3   
D    3     23  5   98.290598    3     16  2   72.072072    5     13  4   
E    5     27  2  115.384615    5     21  3   94.594595    3     19  5   

               
    Deviation  
A  102.941176  
B  127.450980  
C  112.745098  
D   63.725490  
E   93.137255  

Another loop idea:

for x in df.columns.levels[0]:
    df[(x, 'Deviation')] = df[(x, 'Result')].div(df[(x, 'Result')].mean()).mul(100)

out = df.sort_index(axis=1, ascending=False, level=0, sort_remaining=False)
print (out)
  2024                       2023                       2022            \
     N Result SD   Deviation    N Result SD   Deviation    N Result SD   
A    5     20  3   85.470085    5     22  4   99.099099    1     21  3   
B    4     25  2  106.837607    4     25  3  112.612613    4     26  5   
C    9     22  3   94.017094    9     27  1  121.621622    3     23  3   
D    3     23  5   98.290598    3     16  2   72.072072    5     13  4   
E    5     27  2  115.384615    5     21  3   94.594595    3     19  5   

               
    Deviation  
A  102.941176  
B  127.450980  
C  112.745098  
D   63.725490  
E   93.137255  
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading