Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python – If statement nested in For Loop is very slow on large dataframes

I have a dataframe which looks like:

     A      B
0    Plus   632
1    Min    24
2    Min    99
3    Plus   0.15
...  ...    ...

Now for each row where in Column A the value = ‘Min’, the value in Column B on the same row should be multiplied by -1.
For that I am using:

for i in range(1, len(df)):
    if df.loc[i, 'A'].startswith('Min'):
        df.loc[i, 'B'] = df.loc[i, 'B'] * -1

In essence this works, however with large dataframes it is extremely slow. With 50.000 rows, the processing time is 15 seconds, with 100.000 rows 30 seconds, 200.000 rows 160 seconds, and 300.000 500 seconds (exponential increase).

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

However, I am working with a dataframe of more than 700.000 rows and it just never finishes. Any suggestions for an alternative to speed this up?

>Solution :

Try to use vectorized operations and avoid using loops for row-by-row processing in pandas:

>>> import pandas as pd
>>> df = pd.DataFrame({'A': ['Plus', 'Min', 'Min', 'Plus'], 'B': [632, 24, 99, 0.15]})
>>> df
      A       B
0  Plus  632.00
1   Min   24.00
2   Min   99.00
3  Plus    0.15
>>> df.loc[df['A'].str.startswith('Min'), 'B'] *= -1
>>> df
      A       B
0  Plus  632.00
1   Min  -24.00
2   Min  -99.00
3  Plus    0.15

This should be significantly faster.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading