Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python: Detect most similar list from list of lists

I want to detect the most similar list from list of lists in the fastest way.

My searching list:

[1,2,3,4]

The list of lists:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

[[1],[2],[1,2],[1,2,3,4,5,6],[1,2,3],[1,2,3,4,5]]

Most simillar result:

[1,2,3]

I was trying to find that with some common operators in python but it’s too slow in my data. I have about 2 million list of lists that I want to search in them.

>Solution :

The following fonction returns the most similar lists according to the length

def most_similar_acc_length(my_list, range_of_lists, length_range):
    """most similar series according to length
    Parameters
    ----------
    my_list :       The list of interest
    range_of_lists: List of lists where we search the most similar to 'my_list'
    length_range:   Range of series length to be considered as similar to the one of my_list
    
    Returns:
    --------
    List of most similar lists in terms of length
    """
   
    sim_lists=[x for x in range_of_lists if len(x)>=(len(my_list)-length_range) and len(x)<=(len(my_list)+length_range)]
    return sim_lists

If we try it on the lists you shared with length_range length_range=1 we get:

range_of_lists=[[1],[2],[1,2],[1,2,3,4,5,6],[1,2,3],[1,2,3,4,5]]
my_list=[1,2,3,4]

sim_list=most_similar_acc_length(my_list, range_of_lists, 1)

Output

[[1, 2, 3], [1, 2, 3, 4, 5]]

Second step
We set up another function after having similar lists according to length

def most_similar_list(my_list, range_of_lists, length_range):
    # We start with a first selection similar lists in terms of length
    sim_list=most_similar_acc_length(my_list, range_of_lists, length_range)
    
    new_list=[]      # Binary values ==1 when value is same and ==0 when not
    temp_list=[]     # Temprary list to be appended to 'new_list'
    
    for x in sim_list:
        for i in range(min(len(x), len(my_list))):
            if i==min(len(x)-1, len(my_list)-1):
                if x[i]==my_list[i]:
                    temp_list.append(1)
                else:
                    temp_list.append(0)
                new_list.append(temp_list)
                temp_list=[]
            else:
                if x[i]==my_list[i]:
                    temp_list.append(1)
                else:
                    temp_list.append(0)

    max_list=[sum(x) for x in new_list]
    ind_max=max_list.index(max(max_list))
    
    return sim_list[ind_max]

Let’s try this function:

range_of_lists=[[1],[2],[1,2],[1,2,3,4,5,6],[1,2,3],[1,2,3,4,5]]
my_list=[1,2,3,4]

similar_list=most_similar_list(my_list, range_of_lists, 1)

similar_list

Output

[1, 2, 3, 4, 5]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading