Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas mean across highest x columns

I want to be able to calculate the mean of 4 week columns, but if the number in the Top x column is less than 4, I only want to use the maximum x values to calculate the mean (i.e. if Top x = 3, discard the lowest week value when calculating the mean.

Example dataframe:

df = pd.DataFrame({'week 1' : [1.0, 5.0, 7.0, 6.0, np.nan],
               'week 2' : [3.0, np.nan, 9.0, 8.0, np.nan],
               'week 3' : [1.0, 2.0, 2.0, 1.0, 6.0],
               'week 4' : [np.nan, 4.0, 2.0, 7.0, 6.0],
               'Top x' : [3.0, 2.0, 4.0, 3.0, 3.0]})

     week 1  week 2  week 3  week 4  Top x
0     1.0     3.0     1.0     NaN    3.0
1     5.0     NaN     2.0     4.0    2.0
2     7.0     9.0     2.0     2.0    4.0
3     6.0     8.0     1.0     7.0    3.0
4     NaN     NaN     6.0     6.0    3.0

Expected output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

     week 1  week 2  week 3  week 4  Top x   Mean
0     1.0     3.0     1.0     NaN    3.0  1.666667
1     5.0     NaN     2.0     4.0    2.0  4.500000
2     7.0     9.0     2.0     2.0    4.0  5.000000
3     6.0     8.0     1.0     7.0    3.0  7.000000
4     NaN     NaN     6.0     6.0    3.0  6.000000

I don’t know if there’s a way to add a function to pandas.mean() function, or if it’s simpler to sum the top x of the week columns (maybe turning each row into a list?) and divide by the Top x column.

>Solution :

Use:

df1 = df.melt('Top x', ignore_index=False).sort_values('value', ascending=False)

df['Mean'] = (df1[df1.groupby(level=0).cumcount().lt(df1['Top x'])]
                     .groupby(level=0)['value'].mean())
print (df)
   week 1  week 2  week 3  week 4  Top x      Mean
0     1.0     3.0     1.0     NaN    3.0  1.666667
1     5.0     NaN     2.0     4.0    2.0  4.500000
2     7.0     9.0     2.0     2.0    4.0  5.000000
3     6.0     8.0     1.0     7.0    3.0  7.000000
4     NaN     NaN     6.0     6.0    3.0  6.000000
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading