Data
df1
ID stat
AA1 exzone
BB2 exzone5
CC4 limit5
df2
name state
AA1 NY
AA1 NY
AA1 NY
AA1 NY
BB2 GA
BB2 GA
BB2 GA
CC4 CA
CC4 CA
Desired
name stat state
AA1 exzone NY
BB2 exzone5 GA
CC4 limit5 CA
Doing
out = pd.merge(df1,df2, left_on=['ID'], right_on= ['name'], how="left")
however, the above script is giving an exploded output and does not retain the original Left dataframe row count. Any suggestion is appreciated.
>Solution :
A left merge doesn’t mean that the structure will be identical to that of the original left DataFrame. It means that all the left keys will be preserved, even if absent from the right DataFrame. In your case the duplicated keys on the right force the merge to compute all combinations of the rows.
You need to first remove the duplicates:
out = pd.merge(df1, df2.drop_duplicates(), left_on=['ID'], right_on= ['name'], how="left")
If for some reason you have several different states per name, you should find another way to aggregate (pick the first, last, combine the unique states as a single string, etc.), or accept to have row duplications.