Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to make a for loop in python, which is called multiple times consecutively, execute faster?

I’d like to state off the bat that I don’t have a lot of experience with numPy, and deeper explanation would be appreciated(even obvious ones).
Here’s my issue:

converted_X = X

for col in X:
    curr_data = X[col]
    i = 0
    for pix in curr_data:
        inv_pix = 255.0 - pix
        curr_data[i] = inv_pix
        i+=1
    converted_X[col] = curr_data.values

Context: X is a DataFrame with images of handwritten digits (70k images, 784 pixels/image).
The entire point of doing this is to change the black background to white and white numbers to black.
The only problem I’m facing with this is that it’s taking a ridiculously long time. I tried using rich.Progress() to track its execution, and it’s an astonishing 4 hour ETA.
Also, I’m executing this code block in the jupyter notebook extension of VSCode (Might help).

I know it probably has to do with a ton of inefficiencies and under-usage of numPy functionality, but I need guidance.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Thanks in advance.

>Solution :

Never ever write for loop in python on numpy data, that is how you make them faster.
Most of the times, there are ways to have numpy do the for loop for you (meaning, process data by batch. Obviously, there is still a for loop. But not one you wrote in python)

Here, it seems you are trying to compute an inverted image, whose pixels are 255-original pixel.

Just write inverted_image = 255-image

Addition: note that as a python array, numpy arrays are quite inefficient. If you use them just as 2D arrays, that you read and write with low level instruction (settings values individually), then, most of the time, even good’ol python lists are faster. For example, in your case (I’ve just tried), on my machine, your code is 9 times slower with ndarrays than the exact same code, using directly python list of list of values.
The whole point of ndarrays is that they are faster because you can use them with numpy functions that deal with the whole data in batch for you. And that would not be feasible as easily with python lists.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading