Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

why multiprocessing isn't working in parallel?

I’m trying to execute the same function with 4 different inputs in parallel.
it works when the total input is less than 10^5, but when it gets bigger it starts to work concurrently.

 def subList_process(instance):
    print("start")
    l = List(dataSet=instance)
    l.shuffle_set()
    l.sort_set()
    print("done")

   if __name__ == '__main__': 
    p=[]
    p.append(multiprocessing.Process(target=subList_process,args=(lists[0],)))
    p.append(multiprocessing.Process(target=subList_process,args=(lists[1],)))
    p.append(multiprocessing.Process(target=subList_process,args=(lists[2],)))
    p.append(multiprocessing.Process(target=subList_process,args=(lists[3],)))
    
    for process in p : process.start()
    for process in p : process.join()

and here is the output with two different data sizes:

  • 10^4

enter image description here

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  • 10^6

enter image description here

can i get an explanation of what’s happening when the data size is relatively big?
and how can i fix it?

>Solution :

Multiprocessing needs to serialize (i.e. pickle) the arguments in the main process and send it to the subprocesses. Since your parallelized function is quite fast, the function finished before the next set of input arguments was serialized.

Edit: Replying to your comment
There are no good ways to get around this, but here are some alternatives:

  • Try to transfer instructions on how to create arguments instead of the arguments themselves. If your inputs are for instance generated using a creator function, just call the creator function in the subprocess as well.
  • Store arguments on the disk if you only create them once and read them relatively frequently. Particularly if you have an SSD-drive you could read the arguments (e.g. in the form of pickle-files) from the disk in the subprocess. This could be faster.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading