I have a dataframe which looks like:
A B
0 Plus 632
1 Min 24
2 Min 99
3 Plus 0.15
... ... ...
Now for each row where in Column A the value = ‘Min’, the value in Column B on the same row should be multiplied by -1.
For that I am using:
for i in range(1, len(df)):
if df.loc[i, 'A'].startswith('Min'):
df.loc[i, 'B'] = df.loc[i, 'B'] * -1
In essence this works, however with large dataframes it is extremely slow. With 50.000 rows, the processing time is 15 seconds, with 100.000 rows 30 seconds, 200.000 rows 160 seconds, and 300.000 500 seconds (exponential increase).
However, I am working with a dataframe of more than 700.000 rows and it just never finishes. Any suggestions for an alternative to speed this up?
>Solution :
Try to use vectorized operations and avoid using loops for row-by-row processing in pandas:
>>> import pandas as pd
>>> df = pd.DataFrame({'A': ['Plus', 'Min', 'Min', 'Plus'], 'B': [632, 24, 99, 0.15]})
>>> df
A B
0 Plus 632.00
1 Min 24.00
2 Min 99.00
3 Plus 0.15
>>> df.loc[df['A'].str.startswith('Min'), 'B'] *= -1
>>> df
A B
0 Plus 632.00
1 Min -24.00
2 Min -99.00
3 Plus 0.15
This should be significantly faster.