Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Cross referencing Pandas Dataframes with multiple occurrences of right key on left Dataframe

I have two DataFrames, A and B. A has a column which is "B_id", B has a "ID" column. However, many rows on A use the same "B_id" to determine data about that row which repeats. For example, A looks like this:

A:
ID    Name    (...)    B_id
1     John     ...     123
2     Emily    ...     123
3     Erick    ...     321

Where B could look like this:

B:
ID     Model  (...)
123    M1      ...
321    M2      ...

What I want is to create a new Dataframe combine and cross referencing the information from both DataFrames, where all columns for the corresponding B_id would be added to the row in A. The output would look like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Result:
ID    Name    (...)    B_id    Model    (...)
1     John     ...     123     M1        ...
2     Emily    ...     123     M1        ...
3     Erick    ...     321     M2        ...

I’ve tried with pd.merge with ‘left’ as how, but it returns a empty DataFrame.

>Solution :

You were almost there. What you need to do is specify the column on which to merge using the on parameter in pandas’ merge method (on=B_id). In order for this to work, you must first ensure that the column to join on shares the same name in both dataframes. You may either:

  • rename column ID to B_id in the B dataframe: dfB.rename(columns={"ID": "B_id"}
  • or simply create a new column with that name: dfB['B_id'] = dfB['ID'].

Here is a code example:

import pandas as pd

dfA = pd.DataFrame()
dfA['ID'] = [1, 2, 3]
dfA['Name'] = ['John', 'Emily', 'Erick']
dfA['B_id'] = [123, 123, 321]

dfB = pd.DataFrame()
dfB['ID'] = [123, 321]
dfB['Model'] = ['M1', 'M2']

dfB['B_id'] = dfB['ID']

pd.merge(df1,df2, on='B_id')

This returns:

    ID_x    Name    B_id    ID_y    Model
0   1   John    123     123     M1
1   2   Emily   123     123     M1
2   3   Erick   321     321     M2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading