Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Calculating column means based on values in other columns in pandas

I have a pandas dataframe laid out like the following:

[Name_Date]          [Var_A]     [Var_1]    [Var_2]   ...
FooBar_09/2021          9           1          9
FooBar_09/2021          5           2          8
FooBar_09/2021          3           5          6
BarFoo_03/2020          8           3          2      
BarFoo_03/2020          3           4          4      ...
BarFoo_03/2020          4           3          6
BarBar_04/2017          3           1          5
BarBar_04/2017          7           1          3
BarBar_04/2017          1           3          1      ...

I’d like to create a new dataframe with unique values from [Name_Date], and the mean values from [Var_A] based on the groups in [Name_Date]. I’ve gotten this far with the following line:

df_mean = df.groupby('Name_Date', as_index=False)['Var_A'].mean()

What I’d like to do is then expand on this by calculating the mean of columns [Var1] and [Var2], and dividing them by the mean of [Var_A]. I am sure I could do this calculation one by one in a similar fashion to the line above, however I have about a dozen of these [Var] columns so I’m looking for a more expiditious way to do this if anyone can make any suggestions. The end result I’m trying to achieve can be seen below:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

[Name_Date]        [Var_A_mean]   [mean Var_A / mean Var_1]   [mean Var_A / mean Var_2]
FooBar_09/2021         5.6               0.47                     1.3   
BarFoo_03/2020          5                0.66                     0.8
BarBar_04/2017         3.6               0.46                     0.83

Thanks for the help.

>Solution :

Use groupby to compute the mean for all columns then div on index axis:

df_mean = df.groupby('Name_Date').mean()
df_mean.update(df_mean.iloc[:, 1:].div(df_mean['Var_A'], axis=0))
print(df_mean)

# Output:
                   Var_A     Var_1     Var_2
Name_Date                                   
BarBar_04/2017  3.666667  0.454545  0.818182
BarFoo_03/2020  5.000000  0.666667  0.800000
FooBar_09/2021  5.666667  0.470588  1.352941
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading