Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Merge on Nan – The bug is the behavior I want. Should I worry about future correction?

In Pandas, pd.Nan != pd.Nan, Yet for now, merging to dataframe, the Nan will be merge together.

As reported in the question Why does pandas merge on NaN?, the normal behavior should be to not merge on that. The question is discussed on the Pandas issue tracker.

From It_is_chris:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

# merge example
df = pd.DataFrame({'col1':[np.nan, 'match'], 'col2':[1,2]})
df2 = pd.DataFrame({'col1':[np.nan, 'no match'], 'col3':[3,4]})
pd.merge(df,df2, on='col1')

    col1    col2    col3
0   NaN      1       3

Now that we know that, In my code, I need to merge on the Nan as well. I could use the glitch in Pandas, but In the future, could the behavior change and then break my code?

What is the best option to prevent that?

Thanks

>Solution :

As you correctly pointed out, in future, there is a possibility of not being able to join on NaN. Depending on the programming language, this behavior changes.

The easiest future-proof solution would be to replace NaN with "NA" or a similar string. You may replace it back to to NaN post merging if required.

df = pd.DataFrame({'col1':[np.nan, 'match'], 'col2':[1,2]}).fillna("NA")
df2 = pd.DataFrame({'col1':[np.nan, 'no match'], 'col3':[3,4]}).fillna("NA")
pd.merge(df,df2, on='col1')
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading