Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas average across dynamic number of columns

I have a dataframe like as shown below

customer_id   revenue_m7   revenue_m8   revenue_m9  revenue_m10  
   1             1234         1231        1256         1239      
   2             5678         3425        3255         2345      

I would like to do the below

a) get average of revenue for each customer based on latest two columns (revenue_m9 and revenue_m10)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

b) get average of revenue for each customer based on latest four columns (revenue_m7, revenue_m8, revenue_m9 and revenue_m10)

So, I tried the below

df['revenue_mean_2m'] = (df['revenue_m10']+df['revenue_m9'])/2
df['revenue_mean_4m'] = (df['revenue_m10']+df['revenue_m9']+df['revenue_m8']+df['revenue_m7'])/4
df['revenue_mean_4m'] = df.mean(axis=1) # i also tried this but how to do for only two columns (and not all columns)

But if I wish to compute average for past 12 months, then it may not be elegant to write this way. Is there any other better or efficient way to write this? I can just key in number of columns to look back and it can compute the average based on keyed in input

I expect my output to be like as below

customer_id   revenue_m7   revenue_m8   revenue_m9  revenue_m10   revenue_mean_2m   revenue_mean_4m 
           1             1234         1231        1256         1239    1867              1240  
           2             5678         3425        3255         2345    2800           3675.75       

>Solution :

Use filter and slicing:

# keep only the "revenue_" columns
df2 = df.filter(like='revenue_')
# or
# df2 = df.filter(regex=r'revenue_m\d+')

# get last 2/4 columns and aggregate as mean
df['revenue_mean_2m'] = df2.iloc[:, -2:].mean(axis=1)
df['revenue_mean_4m'] = df2.iloc[:, -4:].mean(axis=1)

Output:

   customer_id  revenue_m7  revenue_m8  revenue_m9  revenue_m10  \
0            1        1234        1231        1256         1239   
1            2        5678        3425        3255         2345   

   revenue_mean_2m  revenue_mean_4m  
0           1247.5          1240.00  
1           2800.0          3675.75  
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading