The following merge strategy fails
import pandas as pd
data1 = {'c1': ['J', 'A', 'B'],
'key': [25, 30, 35]}
df1 = pd.DataFrame(data1)
data2 = {'c2': ['A', 'B', 'C'],
'key': ["25","30","36"]}
df2 = pd.DataFrame(data2, dtype="O")
df1.key = df1.key.astype("O")
print(df1.merge(df2, on = "key"))
output:
Empty DataFrame
Columns: [c1, key, c2]
Index: []
Why is pandas failing in this merge? I can convert the column to string dtype as follows and then back to object and it works:
df1.key = df1.key.astype(str).astype("O")
Now the merge is okay and finds the matches
How to understand this behavior?
>Solution :
Converting the Series to object doesn’t make the items strings, it just makes the Series dtype object (An object Series can contain anything: integers, floats, strings, lists, classes…):
df1['key'] = df1['key'].astype('O')
print(df1['key'].tolist())
# [25, 30, 35]
print(type(df1['key'].iloc[0]))
# <class 'int'>
What is important is to convert the items to an identical type, for example strings:
df1['key'] = df1['key'].astype(str)
print(df1['key'].tolist())
# ['25', '30', '35']