Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Avoiding nested for loop when read files in pandas for comparison

I have a dictionary called "file_dic" with the {key:file_path} structure. I want to read in the file path in pandas dataframe, grab the columns, and see if it exists in the other file paths in the dictionary. My solution works, but i want to avoid a nested for loop. What would be the best way to do this? I’m trying to learn better code lol

file_diff = {}
        for i in file_dic.keys():
            temp_col1 = pd.read_csv(file_dic[i], nrows=1).columns.tolist()
            for j in file_dic.keys():
                if (j != i):
                    temp_col2 = pd.read_csv(file_dic[j], nrows=1).columns.tolist()
                    diff_cols = sorted(list(set(temp_col1).difference(set(temp_col2))))
                    file_diff[str(i)+' columns not in '+str(j)] = diff_cols
df = pd.DataFrame.from_dict(file_diff, orient='index').T

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

As per the comments your second loop isn’t necessary, you can use a count variable to check if you are on the first key (first file) and a previous variable to keep track of the file you read on the previous iterations:

file_diff = {}
count = 0
for i in file_dic.keys():
    if count == 0: ## if first file
        previous = pd.read_csv(file_dic[i], nrows=1).columns.tolist()
        previous_key = i
    else:
        temp_col2 = pd.read_csv(file_dic[j], nrows=1).columns.tolist()
        diff_cols = sorted(list(set(previous).difference(set(temp_col2))))
        file_diff[str(previous_key)+' columns not in '+str(i)] = diff_cols
        previous = temp_col2
        previous_key = i
    count += 1
df = pd.DataFrame.from_dict(file_diff, orient='index').T

This way, previous stores the previous file read and compare it to the new file read (temp_col2)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading