I have 2 CSV’s with some names in them. There are few duplicates in each CSV, I need to compare them and if the same names exist in both CSV, update in new CSV.
CSV 1
ID, Names
1, Bill
2, Smith
CSV 2
ID, Names
1, James
2, Gates
3, Bill
As first row in CSV 1 matches with 3rd row in CSV 2
New CSV [Desired Output]
ID, Names
1, Bill
I’ve tried this:
import pandas as pd
df1 = pd.read_csv('csv1.csv')
df2 = pd.read_csv('csv2.csv')
df1['flag'] = 'old'
df2['flag'] = 'new'
df = pd.concat([df1, df2])
dups = df.drop_duplicates(df.columns.difference(['flag']), keep = False)
dups.to_csv('new_CSV.csv', index = False)
>Solution :
Try with merge
out = df1.merge(df2[['Names']],how='inner',on='Names')
Out[44]:
ID Names
0 1 Bill