I want to use **Pandas** to create a new column in a dataframe in which it compares 2 columns in a row in the df, if the values are different to choose an specific column, if the values are equal to chose any of them, if one of them in NaN, to choose the other one (if both are NaN it will already be cover). I was using **Numpy** *select* and **Pandas** *where*, but can’t make it work, and I don’t want to use (yet) a **for** cycle.

If i got this dataframe:

```
df1 = pd.DataFrame([["A", "B"], ["C", "D"], [np.nan, "A"], ["B", np.nan], [np.nan, np.nan], ["E", "E"]], columns=["col1", "col2"])
df1
col1 col2
0 A B
1 C D
2 NaN A
3 B NaN
4 NaN NaN
5 E E
```

I want that the behavior is like this:

- In row 0 because (A != B) & (A != np.nan) & (B != np.nan) I want to choose the value from col1.
- In row 1 because (C != D) & (C != np.nan) & (D != np.nan) I want to choose the value from col1 (same as before).
- In row 2 because (NaN != A) & (A != np.nan) & (NaN == np.nan) I want to choose the value from col 2.
- In row 3 because (B != NaN) & (NaN == np.nan) & (B != np.nan) I want to choose the value from col 1.
- In row 4 because (NaN == NaN) & (NaN == np.nan) & (NaN == np.nan) I want to choose the value from col1 (either would work actually).
- In row 5 because (E == E) & (E != np.nan) & (E != np.nan) I want to choose the value form col 1 (same as before).

In this case I should obtain something like:

```
col1 col2 col3
0 A B A
1 C D C
2 NaN A A
3 B NaN B
4 NaN NaN NaN
5 E E E
```

To achieve this I was doing this with `np.select`

:

```
df1 = pd.DataFrame([["A", "B"], ["C", "D"], [np.nan, "A"], ["B", np.nan], [np.nan, np.nan], ["E", "E"]], columns=["col1", "col2"])
conditions = [
((df['col1'] == df['col2']) & (df['col1'] != np.nan)),
((df['col1'] == df['col2']) & (df['col1'] == np.nan)),
((df['col1'] != df['col2']) & (df['col1'] != np.nan) & (df['col2'] == np.nan)),
((df['col1'] != df['col2']) & (df['col1'] == np.nan) & (df['col2'] != np.nan)),
]
choices = [df['col1'], np.nan, df['col1'], df['col2']]
df['condition'] = np.select(conditions, choices, default="WHAT?")
print(df)
```

But the result is:

```
col1 col2 condition
0 A B WHAT?
1 C D WHAT?
2 NaN A WHAT?
3 B NaN WHAT?
4 NaN NaN WHAT?
5 E E E
```

So, I don’t understand what am I doing wrong.

**SEMI-EDIT**: While writing this I noticed that I was missing a case in choices, so I updated the code to this:

Hope someone can help me, thanks

```
df1 = pd.DataFrame([["A", "B"], ["C", "D"], [np.nan, "A"], ["B", np.nan], [np.nan, np.nan], ["E", "E"]], columns=["col1", "col2"])
conditions = [
((df['col1'] == df['col2']) & (df['col1'] != np.nan)),
((df['col1'] == df['col2']) & (df['col1'] == np.nan)),
((df['col1'] != df['col2']) & (df['col1'] != np.nan) & (df['col2'] == np.nan)),
((df['col1'] != df['col2']) & (df['col1'] == np.nan) & (df['col2'] != np.nan)),
((df['col1'] != df['col2']) & (df['col1'] != np.nan) & (df['col2'] != np.nan)),
]
choices = [df['col1'], np.nan, df['col1'], df['col2'], df["col1"]]
df['condition'] = np.select(conditions, choices, default="WHAT?")
print(df)
```

The result was this one:

```
col1 col2 condition
0 A B A
1 C D C
2 NaN A NaN
3 B NaN B
4 NaN NaN NaN
5 E E E
```

To ilustrate a little bit more, I change choices to: [1, 2, 3, 4, 5]. This is the result:

```
col1 col2 condition
0 A B 5
1 C D 5
2 NaN A 5
3 B NaN 5
4 NaN NaN 5
5 E E 1
```

It always enters in the fifth case, which is ok for row 0 and 1, but row 2 should enter case 4, row 3 should enter case 3, and row 4 should enter case 2. Row 5 is correct with case 1.

Hope I made my self clear, thanks.

**EDIT**: someone commented and then delete something that worked for me. This person pointed out that if you do this: `print(np.nan == np.nan)`

it will return `False`

, so my logic will always fail. Therefore, I must user `notnull()`

and `isna()`

in the conditionals.

### >Solution :

`np.nan != np.nan`

NaNs are never equal to each other so your logic is not doing what you think it is: `df['col1'] != np.nan`

. Trying doing `print(np.nan == np.nan)`

and it will always return False. You want to use `df['col1'].isna()`

to find null values and `df['col1'].notnull()`

to find non-null values.

```
conditions = [
((df['col1'] == df['col2']) & (df['col1'].notnull())),
((df['col1'] == df['col2']) & (df['col1'].isna())),
((df['col1'] != df['col2']) & (df['col1'].notnull()) & (df['col2'].isna())),
((df['col1'] != df['col2']) & (df['col1'].isna()) & (df['col2'].notnull())),
((df['col1'] != df['col2']) & (df.notnull().all(1))),# not sure if this is the logic you actually want
]
choices = [df['col1'], np.nan, df['col1'], df['col2'], df["col1"]]
df['condition'] = np.select(conditions, choices, default="WHAT?")
col1 col2 condition
0 A B A
1 C D C
2 NaN A A
3 B NaN B
4 NaN NaN WHAT?
5 E E E
```