I’m running into an issue when iterating over rows in a pandas data frame
this is the code I am trying to run
data = {'test':[1,1,0,0,3,1,0,3,0],
'test2':[0, 2, 0,1,1,2,7,3,2],
}
df = pd.DataFrame(data)
df['combined'] = df['test'] +df['test2']
df['combined'].astype('float64')
df
for index, row in df.iterrows():
if row['test']>=1 & row['test2']>=1:
row['combined']/=2
else:
pass
so, it should divide by 2 if both test and test2 have a value of 1 or more, however it doesn’t divide all the rows that should be divided.
am I making a mistake somewhere?
this is the outcome when I run the code
corresponding columns are test, test2 and combined
0 1 0 1
1 1 2 3
2 0 0 0
3 0 1 1
4 3 1 2
5 1 2 3
6 0 7 7
7 3 3 3
8 0 2 2
>Solution :
What you are doing is in general a bad practice as iterating the rows should be avoided for performance reasons if is not strictly necessary, the solution is defining mask with your conditions and operate within the mask using .loc:
data = {'test':[1,1,0,0,3,1,0,3,0],
'test2':[0, 2, 0,1,1,2,7,3,2],
}
df = pd.DataFrame(data)
df['combined'] = df['test'] +df['test2']
df['combined'].astype('float64')
mask = (df['test']>=1) & (df['test2']>=1)
df.loc[mask,'combined'] /=2