Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

list mean values of all possible column combinations

I have a dataset that looks like this:

Value         Type       country   mean
-1.975767     Weather    Brazil
-0.540979     Fruits     China
-2.359127     Fruits     China
-2.815604     Corona     China
-0.712323     Weather    UK
-0.929755     Weather    Brazil

I want to calculate an overall mean value for all different combinations of Type & Country. For example:

the mean of Weather, Brazil would be = (-1.975767 -0.929755) / 2

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Then I want to add these combinations to another table:

df2 = pd.DataFrame()
country    type       mean     count
Brazil     Weather             2
Brazil     Corona              
China      Corona              1
China      Fruits              2

I can calculate the mean like this:

print(df.groupby(["type", "country"])["value"].mean())

but how do I save these values in a new df in the desired format?

Edit:
This works

df_new = df.groupby(["type", "country"], as_index = False)["value"].mean()

but then if I try to add a count in the same way:

df_new = df.groupby(["type", "country"], as_index = False).count()

it transposes all the cols instead of adding a count col after the mean col

>Solution :

You can use as_index parameter in your groupby:

df_new = df.groupby(["type", "country"], as_index = False)["value"].mean()

Then the result is a standard dataframe:

    type    country value
0   Corona  China   -2.815604
1   Fruits  China   -1.450053
2   Weather Brazil  -1.452761
3   Weather UK  -0.712323

EDIT: How can we add another column with the count? You can simply append a new column with the result of a new groupby like this:

# original answer
df_new = df.groupby(["type", "country"], as_index = False)["value"].mean().rename(columns={'value':'mean'})

# Add count also
df_new['count'] = df.groupby(["type", "country"])["value"].count().tolist()
df_new

Output:

type    country mean    count
0   Corona  China   -2.815604   1
1   Fruits  China   -1.450053   2
2   Weather Brazil  -1.452761   2
3   Weather UK  -0.712323   1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading