Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

compare columns with NaN or <NA> values pandas

I have the dataframe with NaN and values, now I want to compare two columns in the same dataframe whether each row values in null or not null. For examples,

  1. if the column a_1 have null values, column a_2 have not null values, then for that particular
    row, the result should be 1 in the new column a_12.
  2. If the values in both a_1(value is 123) & a_2(value is 345) is not null, and the values are
    not equal, then the result should be 3 in column a_12.

below is the code snippet I have used for comparison, for the scenario 1, I am getting the result as 3 instead of 1. Please guide me to get the correct output.

    try:
        if (x[cols[0]]==x[cols[1]]) & (~np.isnan(x[cols[0]])) & (~np.isnan(x[cols[1]])):   
            return 0

        elif (np.isnan(x[cols[0]])) & (np.isnan(x[cols[1]])):
            return 0

        elif (~np.isnan(x[cols[0]])) & (np.isnan(x[cols[1]])):
            return 1

        elif (np.isnan(x[cols[0]])) & (~np.isnan(x[cols[1]])):
            return 2

        elif (x[cols[0]]!=x[cols[1]]) & (~np.isnan(x[cols[0]])) & (~np.isnan(x[cols[1]])):
            return 3
        else:
            pass

    except Exception as exc:
        if (x[cols[0]]==x[cols[1]]) & (pd.notna(x[cols[0]])) & (pd.notna(x[cols[1]])):   
            return 0

        elif (pd.isna(x[cols[0]])) & (pd.isna(x[cols[1]])):
            return 0

        elif (pd.notna(x[cols[0]])) & (pd.isna(x[cols[1]])):
            return 1

        elif (pd.isna(x[cols[0]])) & (pd.notna(x[cols[1]])):
            return 2

        elif (x[cols[0]]!=x[cols[1]]) & (pd.notna(x[cols[0]])) & (pd.notna(x[cols[1]])):
            return 3
        else:
            pass

I have used pd.isna() and pd.notna(), also np.isnan() and ~np.isnan(), because for some columns the second method (np.isnan()) is working, for some columns its just throwing an error.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Please guide me to achieve the result as excepted.

Expected Output:

| a_1       | a_2     | result |
|-----------|---------|--------|
| gssfwe    | gssfwe  |   0    |
| <NA>      | <NA>    |   0    |
| fsfsfw    | <NA>    |   1    |
| <NA>      | qweweqw |   2    |
| adsadgsgd | wwuwquq |   3    |

Output Got with the above code:

| a_1       | a_2     | result |
|-----------|---------|--------|
| gssfwe    | gssfwe  |   0    |
| <NA>      | <NA>    |   0    |
| fsfsfw    | <NA>    |   3    |
| <NA>      | qweweqw |   3    |
| adsadgsgd | wwuwquq |   3    |

>Solution :

Going by the logic in your code, you’d want to define a function and apply it across your DataFrame.

df = pd.DataFrame({'a_1': [1, 2, np.nan, np.nan, 1], 'a_2': [2, np.nan, 1, np.nan, 1]})

The categories you want map neatly to binary numbers, which you can use to write a short function like –

def nan_check(row):
    x, y = row
    if x != y:
        return int(f'{int(pd.notna(y))}{int(pd.notna(x))}', base=2)
    return 0

df['flag'] = df.apply(nan_check, axis=1)

Output

   a_1  a_2  flag
0  1.0  2.0     3
1  2.0  NaN     1
2  NaN  1.0     2
3  NaN  NaN     0
4  1.0  1.0     0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading