Home Pandas Dataframe Multiindex – Calculate Mean and add additional column to each level of the index

Questions

Pandas Dataframe Multiindex – Calculate Mean and add additional column to each level of the index

December 12, 2024

Given the following dataframe:

Year     2024          2023          2022
Header   N Result SD   N Result SD   N Result SD
Vendor
A        5 20     3    5 22     4    1 21     3
B        4 25     2    4 25     3    4 26     5
C        9 22     3    9 27     1    3 23     3   
D        3 23     5    3 16     2    5 13     4 
E        5 27     2    5 21     3    3 19     5

I would like to calculate for each year the mean value of the results column and then create a column, where the relative deviation to the mean is displayed (e.g. Results Value / mean-value * 100). The N and SD column were just included for completeness and is not needed for the calculation.

Year     2024                    2023                      2022
Header   N Result SD Deviation   N Result SD Deviation     N Result SD Deviation
Vendor
A        5 20     3  85.5        5 22     4  99.1          1 21     3  ..
B        4 25     2  106         4 25     3  113           4 26     5  ..
C        9 22     3  ..          9 27     1  ..            3 23     3  .. 
D        3 23     5  ..          3 16     2  ..            5 13     4  ..
E        5 27     2  ..          5 21     3  ..            3 19     5  ..

How what i be able to achieve that?

Thanks a lot in advance!

>Solution :

Use DataFrame.xs for select Result labels in MultiIndex, divide by mean and append to original in concat, last for correct position add DataFrame.sort_index with parameter sort_remaining=False:

df1 = df.xs('Result', axis=1, level=1, drop_level=False)

out = (pd.concat([df, 
                 df1.div(df1.mean()).mul(100)
                    .rename(columns={'Result':'Deviation'})], axis=1)
         .sort_index(axis=1, ascending=False, level=0, sort_remaining=False))
print (out)
  2024                       2023                       2022            \
     N Result SD   Deviation    N Result SD   Deviation    N Result SD   
A    5     20  3   85.470085    5     22  4   99.099099    1     21  3   
B    4     25  2  106.837607    4     25  3  112.612613    4     26  5   
C    9     22  3   94.017094    9     27  1  121.621622    3     23  3   
D    3     23  5   98.290598    3     16  2   72.072072    5     13  4   
E    5     27  2  115.384615    5     21  3   94.594595    3     19  5   

               
    Deviation  
A  102.941176  
B  127.450980  
C  112.745098  
D   63.725490  
E   93.137255

Another loop idea:

for x in df.columns.levels[0]:
    df[(x, 'Deviation')] = df[(x, 'Result')].div(df[(x, 'Result')].mean()).mul(100)

out = df.sort_index(axis=1, ascending=False, level=0, sort_remaining=False)
print (out)
  2024                       2023                       2022            \
     N Result SD   Deviation    N Result SD   Deviation    N Result SD   
A    5     20  3   85.470085    5     22  4   99.099099    1     21  3   
B    4     25  2  106.837607    4     25  3  112.612613    4     26  5   
C    9     22  3   94.017094    9     27  1  121.621622    3     23  3   
D    3     23  5   98.290598    3     16  2   72.072072    5     13  4   
E    5     27  2  115.384615    5     21  3   94.594595    3     19  5   

               
    Deviation  
A  102.941176  
B  127.450980  
C  112.745098  
D   63.725490  
E   93.137255