Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to combine different rows and get mean value for columns in a dataframe

I have the following df which contains 2 types of information. The first one is the characteristics of the item (some are strings and others are integers). The other type is regarding emission values of the said item (in a float format).

Charact. 1 Charact. 2 Charact. 3 Emission 1 Emission 2
1998 AB C 1 2
1998 AB C 3 4
2000 AB C 1 2
2001 DE F 1 2
2001 DE F 3 4

I would like to combine the items which have the same 3 characteristics and get the mean value of the 2 emissions to get the following df :

Charact. 1 Charact. 2 Charact. 3 Emission 1 Emission 2
1998 AB C 2 3
2000 AB C 1 2
2001 DE F 2 3

I have tried this line of code to get it to work but it gives me an error

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df.groupby(['Charact. 1', 'Charact. 2', 'Charact. 3'], as_index=False).agg({'Emission 1': 'mean', 'Emission 2': 'mean',})

The specific error says : ValueError: Length of values (10345) does not match length of index (10687600)

>Solution :

This worked for me:

df = pd.DataFrame({'c1': [1998, 1998, 2000, 2001, 2001],
                   'c2': ['AB', 'AB', 'AB', 'DE', 'DE'],
                   'c3': ['C', 'C', 'C', 'F', 'F'],
                   'e1': [1, 3, 1, 1, 3],
                   'e2': [2, 4, 2, 2, 4]})
print(df.groupby(['c1','c2','c3'], as_index=False).mean())    

# Output:
#      c1  c2 c3  e1  e2                                
# 0  1998  AB  C   2   3
# 1  2000  AB  C   1   2
# 2  2001  DE  F   2   3

Edit: This also worked for me, so I’m not sure where exactly the problem lies in your code– perhaps the DataFrame is structured somewhat differently compared to what your question implies?

df = pd.DataFrame({'c1': [1998, 1998, 2000, 2001, 2001],
                   'c2': ['AB', 'AB', 'AB', 'DE', 'DE'],
                   'c3': ['C', 'C', 'C', 'F', 'F'],
                   'e1': [1, 3, 1, 1, 3],
                   'e2': [2, 4, 2, 2, 4]})
print(df.groupby(['c1','c2','c3'], as_index=False).agg({'e1': 'mean', 'e2': 'mean',}))

# Output:
#      c1  c2 c3  e1  e2
# 0  1998  AB  C   2   3
# 1  2000  AB  C   1   2
# 2  2001  DE  F   2   3
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading