I have two DataFrames, A and B. A has a column which is "B_id", B has a "ID" column. However, many rows on A use the same "B_id" to determine data about that row which repeats. For example, A looks like this:
A:
ID Name (...) B_id
1 John ... 123
2 Emily ... 123
3 Erick ... 321
Where B could look like this:
B:
ID Model (...)
123 M1 ...
321 M2 ...
What I want is to create a new Dataframe combine and cross referencing the information from both DataFrames, where all columns for the corresponding B_id would be added to the row in A. The output would look like this:
Result:
ID Name (...) B_id Model (...)
1 John ... 123 M1 ...
2 Emily ... 123 M1 ...
3 Erick ... 321 M2 ...
I’ve tried with pd.merge with ‘left’ as how, but it returns a empty DataFrame.
>Solution :
You were almost there. What you need to do is specify the column on which to merge using the on parameter in pandas’ merge method (on=B_id). In order for this to work, you must first ensure that the column to join on shares the same name in both dataframes. You may either:
- rename column
IDtoB_idin the B dataframe:dfB.rename(columns={"ID": "B_id"} - or simply create a new column with that name:
dfB['B_id'] = dfB['ID'].
Here is a code example:
import pandas as pd
dfA = pd.DataFrame()
dfA['ID'] = [1, 2, 3]
dfA['Name'] = ['John', 'Emily', 'Erick']
dfA['B_id'] = [123, 123, 321]
dfB = pd.DataFrame()
dfB['ID'] = [123, 321]
dfB['Model'] = ['M1', 'M2']
dfB['B_id'] = dfB['ID']
pd.merge(df1,df2, on='B_id')
This returns:
ID_x Name B_id ID_y Model
0 1 John 123 123 M1
1 2 Emily 123 123 M1
2 3 Erick 321 321 M2