Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Conditional mean while using iloc pandas

Assume I have a dataframe with columns stated below (consist more column in actual data).

Customer   Group1     jan_revenue feb_revenue mar_revenue
  Sam      Bank A          40           50          0
Wilson     Bank A          60           70         30
  Jay      Bank B          10           40         40
  Jim      Bank A           0           40         70
  Yan      Bank C           0           40         90
  Tim      Bank C          10            0         50

I want to calculate the mean for each customer but only those are non-zero.

For example, customer Sam has mean (40+50)/2 = 45 and Wilson (60+70+30)/3 = 53.3333

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Since I have a large number of columns, so i choose to use iloc but my approach included all the 0.

df['avg_revenue21'] = df.iloc[:,27:39].mean(axis=1)

May I know is there a way for conditional mean while using iloc?

Thank you

>Solution :

You can use select_dtypes to get numeric columns, replace the zeros with NA, then get the mean as usual:

df.select_dtypes('number').replace(0, pd.NA).mean(axis=1)

output:

Sam       45.000000
Wilson    53.333333
Jay       30.000000
Jim       55.000000
Yan       65.000000
Tim       30.000000
dtype: float64

As new column:

df['avg_revenue21'] = df.select_dtypes('number').replace(0, pd.NA).mean(axis=1)

       Customer Group1  jan_revenue  feb_revenue  mar_revenue  avg_revenue21
Sam        Bank      A           40           50            0      45.000000
Wilson     Bank      A           60           70           30      53.333333
Jay        Bank      B           10           40           40      30.000000
Jim        Bank      A            0           40           70      55.000000
Yan        Bank      C            0           40           90      65.000000
Tim        Bank      C           10            0           50      30.000000

variants:

If the input are strings:

df['avg_revenue21'] = df.apply(pd.to_numeric, errors='coerce').replace(0, pd.NA).mean(axis=1)

If you only want to consider a subset:

df['avg_revenue21'] = df.filter(regex='(feb|mar)_').replace(0, pd.NA).mean(axis=1)

or:

df['avg_revenue21'] = df[['feb_revenue', 'mar_revenue']].replace(0, pd.NA).mean(axis=1)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading