Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

A More Efficient Way to Compare CSV Columns and Adding to a certain row on Second CSV (Python)

So i’m new to python, I want to discover it’s potential and get more knowledge about what I can do with it.
I did this code to compare CSVs, basically what it does you provide it 2 CSVs, CSV1 that has some id column and a column with values that you want to add to other CSV (CSV2)

Note: This script does exactly what I want, and it seems to work fine, hope it can also be useful to someone, my question is really what I can do to improve it’s performance or even make the code cleaner

# Made by Varqas
# CSV1 = CSV containing values that can be matched in CSV2 and a column that will be added
# CSV2 = CSV containing values that can be matched and column that will be concatenated at the end of the CSV (The last column values should be empty)

with open('csv1.csv', encoding="utf8") as check_file:
    # Get Column that will be used to Compare values and add it to a list
    columnToCompare = list([row.split(',')[0].strip() for row in check_file])

with open('csv1.csv', encoding="utf8") as check_file:
    # Get Column that will be used to add to a row values and add it to a list
    columnToAdd = list([row.split(',')[2].strip() for row in check_file])

with open('csv2.csv', 'r', encoding="utf8") as in_file, open('out.csv', 'w', encoding="utf8") as out_file:
    i = 0
    # For each Row in CSV2
    for line in in_file:
        # Write Headers
        if i == 0:
            out_file.write(line)
        else:
            # GET Column on CSV2 containing value that will be compared on CVS1
            value = line.split(',')[1].strip()
            # Check if first Column value on CSV2  either variable is in 
            if value in columnToCompare:
                # Check for duplicates in the list 
                numberOfOccurences = list(columnToCompare).count(value)
                concatRow = ""
                if numberOfOccurences > 1:
                    # Concatenate all values of occurences
                    for x in range(numberOfOccurences):
                        index = list(columnToCompare).index(value)
                        concatRow = concatRow + columnToAdd[index]
                        if x != numberOfOccurences - 1:
                            concatRow = concatRow + " + "
                        # Remove value so list.index doesn't found same row
                        columnToCompare[index] = ""
                else:
                    # Add other row that doesn't match
                    index = list(columnToCompare).index(value)
                    concatRow = columnToAdd[index]

                # Concat to last column of CSV2
                out_file.write(line.strip() + concatRow + "\n")
            else:
                # Still concat value in CSV2 to last column if not found in csv1 
                out_file.write(line.strip() + "not found" + "\n")
        i = i + 1

I know it can be improved, and minified perhaps using some libs… Let me know what you think!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I tried using pd merge, but I didn’t understand quite well how I could add concatenation and values inside it.

>Solution :

You can use the Pandas library to read the two CSV files into dataframes and merge the two columns into the second CSV and output a new CSV with the merged columns.

import pandas as pd

# read first CSV
df1 = pd.read_csv('first.csv')

# read second CSV
df2 = pd.read_csv('second.csv')

# merge the id column and a "column with values 
# that you want to add to other CSV (CSV2)"
# for the example the second column is named 'data'.
merged_df = pd.merge(df2, df1[['id', 'data']], on='id', how='left')

# save new dataframe to csv.
merged_df.to_csv('merged.csv', index=False)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading