Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

VLOOKUP using pandas without repetition

I have two dataframes df1 and df2. How do I obtain df3 using pd.merge or any other function?

What I tried?

df3=df1.merge(df2, on='A', how='left')

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

This gives me df3 with number of rows same as df2. But, what I want is number of rows in df1 and df3 are same. df3 should look exactly as in the example provided.

df1

A   B
1   1
2   1
3   2
4   2
5   3
6   7
7   1
8   1
9   9
10  3

df2

A   C
1   XY
1   XY
2   XY
2   XY
2   XY
3   XY
3   XY
4   XY
4   XY
5   XY
5   XY
6   AB
6   AB
7   AB
8   AB
9   AB
9   AB
9   AB
10  MN
10  MN

df3

A   B   C
1   1   XY
2   1   XY
3   2   XY
4   2   XY
5   3   XY
6   7   AB
7   1   AB
8   1   AB
9   9   AB
10  3   MN

>Solution :

Use drop_duplicates to keep one instance of (A, B):

>>> df1.merge(df2, on='A', how='left').drop_duplicates(['A', 'B'], ignore_index=True)
     A  B   C
0    1  1  XY
2    2  1  XY
5    3  2  XY
7    4  2  XY
9    5  3  XY
11   6  7  AB
13   7  1  AB
14   8  1  AB
15   9  9  AB
18  10  3  MN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading