Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

performance problem, code works but consider takes long time in long list

Why the following code is consider inefficient and how can I improve it? while the code works, in huge n or big a/b it fails to deliver instant results.

what have I tired? I sorted initially n,a and b but no change in performance.

Objective, find the sum of h-m
Note: len(a) always equal to len(b)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

n=[1, 5, 3] #//can be with 100K+ items
a=set([3,1]) #//can be with 50K+ items
b=set([5,7])

h=0
m=0
for ia, ib in zip(a,b):
    if ia in n:
        h+=1
    if ib in n:
        m+=1

print (h-m)

>Solution :

Since n is a list, and it’s huge (100K+ items), each if WHATEVER in n: is doing O(n) work, involving 100K+ equality checks.

You basically have your types backwards here; you’re using sets for things you iterate (where being a set is saving you little aside from perhaps removing duplicates from your inputs) and using lists for things you membership test (where O(n) containment checks are much more expensive on large lists than O(1) containment checks are for sets of any size).

Assuming the elements of n are hashable, convert them to a set before the loop and use containment tests against the set:

n=[1, 5, 3] #can be with 100K+ items
nset = set(n)   # Cache set view of n
a=set([3,1]) #can be with 50K+ items
b=set([5,7])

h=0
m=0
for ia, ib in zip(a,b):
    if ia in nset:  # Check against set in O(1)
        h+=1
    if ib in nset:  # Check against set in O(1)
        m+=1

print (h-m)

Note that ziping is doing nothing except possibly excluding some elements from being iterated at all; if len(a) != len(b), you’ll fail to check the elements that would be iterated beyond the length of the shortest set. If you want to count them all, the simplest solution is to split the loops replacing the single loop with just:

h = sum(1 for ia in a if ia in nset)  # sum(ia in nset for ia in a) also works, but it's somewhat slower/less intuitive
m = sum(1 for ib in b if ib in nset)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading