I have data that looks like this:
id case2_q6
0 300 3.0
1 304 4.0
2 306 3.0
3 309 1.0
4 311 3.0
5 312 4.0
6 314 NaN
7 315 2.0
8 316 3.0
9 317 3.0
And using this np.where() function call to generate new variables:
df['fluid_2'] = np.where((df['case2_q6'] == 1) | (df['case2_q6'] == 2), 1, 0)
Now df has the column fluid_2 as below:
id case2_q6 fluid_2
0 300 3.0 0
1 304 4.0 0
2 306 3.0 0
3 309 1.0 1
4 311 3.0 0
5 312 4.0 0
6 314 NaN 0
7 315 2.0 1
8 316 3.0 0
9 317 3.0 0
As you can see, the NaN value at index 6 was converted to a 0. Is there a way to set up the np.where() so as to leave those as NaN values in fluid_2?
The desired output would be:
id case2_q6 fluid_2
0 300 3.0 0
1 304 4.0 0
2 306 3.0 0
3 309 1.0 1
4 311 3.0 0
5 312 4.0 0
6 314 NaN NaN
7 315 2.0 1
8 316 3.0 0
9 317 3.0 0
Where the NaN is preserved.
>Solution :
A possible solution:
df['fluid_2'] = np.where(
df['case2_q6'].isna(), np.nan,
np.where((df['case2_q6'] == 1) | (df['case2_q6'] == 2), 1, 0))
Another possible solution:
df['fluid_2'] = df['case2_q6'].clip(upper=1).mul(df['case2_q6'].isin([1,2]))
Output:
id case2_q6 fluid_2
0 300 3.0 0.0
1 304 4.0 0.0
2 306 3.0 0.0
3 309 1.0 1.0
4 311 3.0 0.0
5 312 4.0 0.0
6 314 NaN NaN
7 315 2.0 1.0
8 316 3.0 0.0
9 317 3.0 0.0