Camparison closeness coordinates from two files

October 3, 2024

I need to check two csv files containing coordinates, line by line, if any point from one file is close (less than 10 meters) in the second file, it returns True, unfortunately in the attached code, the second loop, I don’t know why, is executed only once. Besides, maybe there is a faster way?

Below is my code:

def comparison_closeness_two_files(filename1, filename2):

    threshold_meters = 10/1000 # 10 meters

    file1 = open(filename1, 'r')
    file1 = csv.reader(file1, delimiter=',')
    file2 = open(filename2, 'r')
    file2 = csv.reader(file2, delimiter=',')

    for index, fc1 in enumerate(file1):
        if (len(fc1) != 4):
            continue
        lat1 = float(fc1[2])
        lon1 = float(fc1[3])

        for index, fc2 in enumerate(file2):
            if (len(fc2) != 4):
                continue
            lat2 = float(fc2[2])
            lon2 = float(fc2[3])
            distance = get_distance_between(lat1, lon1, lat2, lon2)
            if (distance <= threshold_meters):
                return True

    return False

>Solution :

your problem is linked to the fact that you work with file streams and not array. See them as generators. Once you iterated over it once, there is nothing left, the stream has been consumed.

To get over this, preload your data into separate lists (at least for the second file, which with you iterate a lot over):

Here is a solution that pre-load both files in arrays:

def comparison_closeness_two_files(filename1, filename2):

    threshold_meters = 10/1000 # 10 meters

    with open(filename1, 'r') as f1, open(filename2, 'r') as f2:
        file1 = list(csv.reader(f1, delimiter=','))
        file2 = list(csv.reader(f2, delimiter=','))

    for fc1 in file1:
        if len(fc1) != 4:
            continue
        lat1 = float(fc1[2])
        lon1 = float(fc1[3])

        for fc2 in file2:
            if len(fc2) != 4:
                continue
            lat2 = float(fc2[2])
            lon2 = float(fc2[3])
            distance = get_distance_between(lat1, lon1, lat2, lon2)
            if distance <= threshold_meters:
                return True

    return False

As mentioned before, you might want to just pre load the second file into an array, for memory usage concerns.