Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I read two CSV files and compare 1 column from both and then write to a new file where columns match

I have 2 CSV’s with some names in them. There are few duplicates in each CSV, I need to compare them and if the same names exist in both CSV, update in new CSV.

CSV 1

ID, Names
    1, Bill
    2, Smith

CSV 2

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

ID, Names
    1, James
    2, Gates
    3, Bill

As first row in CSV 1 matches with 3rd row in CSV 2

New CSV [Desired Output]

ID, Names
    1, Bill

I’ve tried this:

import pandas as pd

df1 = pd.read_csv('csv1.csv')
df2 = pd.read_csv('csv2.csv')

df1['flag'] = 'old'
df2['flag'] = 'new'

df = pd.concat([df1, df2])

dups = df.drop_duplicates(df.columns.difference(['flag']), keep = False)
dups.to_csv('new_CSV.csv', index = False)

>Solution :

Try with merge

out = df1.merge(df2[['Names']],how='inner',on='Names')
Out[44]: 
   ID  Names
0   1   Bill
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading