Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Replace values in multiple columns using 'numpy.where' function

I want to change the NA values in multiple columns when certain condition is met. Example data set given below.

    Pool Area   Pool Quality    Pool Type   Pool Condition  Pool Finish
0   800             Good           A              Good        Gunite
1   400             Good           C              Good        Vinyl
2   485             Good           B              Good        Fibreglass
3   360             Poor           C              Poor        Vinyl
4   0               NaN            NaN            NaN         NaN
5   600             Best           A              Best        Gunite
6   500             Best           B              Best        Fibreglass
7   0               NaN            NaN            NaN         NaN
8   750             Best           A              Best        Gunite
9   900             Best           A              Best        Gunite
10  0               NaN            NaN            NaN         NaN
11  900             Best           A              Best        Gunite
12  400             Poor           C              Poor        Fibreglass
13  0               NaN            NaN            NaN         NaN

In the above data, I want to replace the NaN values with ‘No Pool’ where the ‘Pool Area’ column has value ‘0’.

I know I can do it with np.where function and I tried the below code.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df[['Pool Quality', 'Pool Type', 'Pool Condition', 'Pool Finish']] = np.where(df['Pool Area']==0, 'No Pool', df[['Pool Quality', 'Pool Type', 'Pool Condition', 'Pool Finish']])

It isn’t working.

I tried it separately, it works (Ref to code below).

df['Pool Quality'] = np.where(df['Pool Area']==0, 'No Pool', df['Pool Quality'])

But when I tried to do for multiple columns in one go, it is not working.

Below is the error I got.

ValueError: operands could not be broadcast together with shapes (2919,) () (2919,5)

NOTE: The above error message is taken from my actual data set where the dimension is 2919 rows and 81 columns.

I don’t know what is wrong in my code. Please help me.

>Solution :

Use boolean indexing:

m = df['Pool Area'].eq(0)

df.loc[m] = df.loc[m].fillna('No Pool')
# or
# df[m] = df[m].fillna('No Pool')

# or to limit to given columns
# cols = ['Pool Quality', 'Pool Type', 'Pool Condition', 'Pool Finish']
# df.loc[m, cols] = df.loc[m, cols].fillna('No Pool')

updated df:

    Pool Area Pool Quality Pool Type Pool Condition Pool Finish
0         800         Good         A           Good      Gunite
1         400         Good         C           Good       Vinyl
2         485         Good         B           Good  Fibreglass
3         360         Poor         C           Poor       Vinyl
4           0      No Pool   No Pool        No Pool     No Pool
5         600         Best         A           Best      Gunite
6         500         Best         B           Best  Fibreglass
7           0      No Pool   No Pool        No Pool     No Pool
8         750         Best         A           Best      Gunite
9         900         Best         A           Best      Gunite
10          0      No Pool   No Pool        No Pool     No Pool
11        900         Best         A           Best      Gunite
12        400         Poor         C           Poor  Fibreglass
13          0      No Pool   No Pool        No Pool     No Pool
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading