Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Why does pandas fail to join on two columns of object dtype (one of them is converted from int to object)?

The following merge strategy fails

import pandas as pd
data1 = {'c1': ['J', 'A', 'B'],
         'key': [25, 30, 35]}
df1 = pd.DataFrame(data1)

data2 = {'c2': ['A', 'B', 'C'],
         'key': ["25","30","36"]}
df2 = pd.DataFrame(data2, dtype="O")

df1.key = df1.key.astype("O")

print(df1.merge(df2, on = "key"))

output:
Empty DataFrame
Columns: [c1, key, c2]
Index: []

Why is pandas failing in this merge? I can convert the column to string dtype as follows and then back to object and it works:

df1.key = df1.key.astype(str).astype("O")

Now the merge is okay and finds the matches

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

How to understand this behavior?

>Solution :

Converting the Series to object doesn’t make the items strings, it just makes the Series dtype object (An object Series can contain anything: integers, floats, strings, lists, classes…):

df1['key'] = df1['key'].astype('O')

print(df1['key'].tolist())
# [25, 30, 35]

print(type(df1['key'].iloc[0]))
# <class 'int'>

What is important is to convert the items to an identical type, for example strings:

df1['key'] = df1['key'].astype(str)

print(df1['key'].tolist())
# ['25', '30', '35']
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading