Specify NaN encoding in np.where() logic

March 21, 2023

I have data that looks like this:

    id    case2_q6
0   300   3.0
1   304   4.0
2   306   3.0
3   309   1.0
4   311   3.0
5   312   4.0
6   314   NaN
7   315   2.0
8   316   3.0
9   317   3.0

And using this np.where() function call to generate new variables:

df['fluid_2'] = np.where((df['case2_q6'] == 1) | (df['case2_q6'] == 2), 1, 0)

Now df has the column fluid_2 as below:

    id    case2_q6  fluid_2
0   300   3.0       0
1   304   4.0       0
2   306   3.0       0
3   309   1.0       1
4   311   3.0       0
5   312   4.0       0
6   314   NaN       0
7   315   2.0       1
8   316   3.0       0
9   317   3.0       0

As you can see, the NaN value at index 6 was converted to a 0. Is there a way to set up the np.where() so as to leave those as NaN values in fluid_2?

The desired output would be:

    id    case2_q6  fluid_2
0   300   3.0       0
1   304   4.0       0
2   306   3.0       0
3   309   1.0       1
4   311   3.0       0
5   312   4.0       0
6   314   NaN       NaN
7   315   2.0       1
8   316   3.0       0
9   317   3.0       0

Where the NaN is preserved.

>Solution :

A possible solution:

df['fluid_2'] = np.where(
    df['case2_q6'].isna(), np.nan, 
    np.where((df['case2_q6'] == 1) | (df['case2_q6'] == 2), 1, 0))

Another possible solution:

df['fluid_2'] = df['case2_q6'].clip(upper=1).mul(df['case2_q6'].isin([1,2]))

Output:

    id  case2_q6  fluid_2
0  300       3.0      0.0
1  304       4.0      0.0
2  306       3.0      0.0
3  309       1.0      1.0
4  311       3.0      0.0
5  312       4.0      0.0
6  314       NaN      NaN
7  315       2.0      1.0
8  316       3.0      0.0
9  317       3.0      0.0