Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

mean of a column after group by returns nan

I have this:

df = name   year.   salary.   d.   
     a      1990.     3.       5
     b      1992.     90.      1 
     c      1990.     234.     3 
     ...

I am trying to group my data frame based on year, and then get the average of the salaries in that year. Then my goal is to assign it to a new column. This is what I do:

df['averageSalaryPerYear'] = df.groupby('year')['salary'].mean()

I do get the correct results for df.groupby(‘year’)[‘salary’].mean(), since when I print them, I get a column of numbers in scientific notation. However, when I assign it to df[‘averageSalaryPerYear’], they all turn into nan. I am not sure why this is happening as the printed values seem to be fine, although they are in scientific notation like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

1990 1.707235e+07

1991 2.357879e+07

1992 3.098244e+07

which is year and avgOfSalary

Why is this happening? I want my new column to show the correct results of averages.

Thanks

>Solution :

After groupby the length of rows are different so you can’t add it as new column.

Try transform.

df['averageSalaryPerYear'] = df.groupby('year')['salary'].transform(np.mean)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading