Conditional mean while using iloc pandas

Assume I have a dataframe with columns stated below (consist more column in actual data).

Customer   Group1     jan_revenue feb_revenue mar_revenue
  Sam      Bank A          40           50          0
Wilson     Bank A          60           70         30
  Jay      Bank B          10           40         40
  Jim      Bank A           0           40         70
  Yan      Bank C           0           40         90
  Tim      Bank C          10            0         50

I want to calculate the mean for each customer but only those are non-zero.

For example, customer Sam has mean (40+50)/2 = 45 and Wilson (60+70+30)/3 = 53.3333

Since I have a large number of columns, so i choose to use iloc but my approach included all the 0.

df['avg_revenue21'] = df.iloc[:,27:39].mean(axis=1)

May I know is there a way for conditional mean while using iloc?

Thank you

>Solution :

You can use select_dtypes to get numeric columns, replace the zeros with NA, then get the mean as usual:

df.select_dtypes('number').replace(0, pd.NA).mean(axis=1)

output:

Sam       45.000000
Wilson    53.333333
Jay       30.000000
Jim       55.000000
Yan       65.000000
Tim       30.000000
dtype: float64

As new column:

df['avg_revenue21'] = df.select_dtypes('number').replace(0, pd.NA).mean(axis=1)

       Customer Group1  jan_revenue  feb_revenue  mar_revenue  avg_revenue21
Sam        Bank      A           40           50            0      45.000000
Wilson     Bank      A           60           70           30      53.333333
Jay        Bank      B           10           40           40      30.000000
Jim        Bank      A            0           40           70      55.000000
Yan        Bank      C            0           40           90      65.000000
Tim        Bank      C           10            0           50      30.000000

variants:

If the input are strings:

df['avg_revenue21'] = df.apply(pd.to_numeric, errors='coerce').replace(0, pd.NA).mean(axis=1)

If you only want to consider a subset:

df['avg_revenue21'] = df.filter(regex='(feb|mar)_').replace(0, pd.NA).mean(axis=1)

or:

df['avg_revenue21'] = df[['feb_revenue', 'mar_revenue']].replace(0, pd.NA).mean(axis=1)

Leave a Reply