I want to use every 5th row as a reference row (ref_row), divide this ref_row starting from this ref_row and do the same for the next 4 rows.
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
len = df.shape[0]
for idx in range(0,len,5):
ref_row = df.iloc[idx:idx+1,:]
for idx_next in range(idx,idx+5):
df.iloc[idx_next:idx_next+1,:] = df.iloc[idx_next:idx_next+1,:].div(ref_row)
However, I got all NaN except the ref_row.
A B C D
0 1.0 1.0 1.0 1.0
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN
... ... ... ... ...
95 1.0 1.0 1.0 1.0
96 NaN NaN NaN NaN
97 NaN NaN NaN NaN
98 NaN NaN NaN NaN
99 NaN NaN NaN NaN
Any idea what’s wrong?
>Solution :
The problem with your code is that with df.iloc[idx_next:idx_next+1,:] and df.iloc[idx:idx+1,:], you’re indexing df rows as DF objects. So when you divide, the indices don’t match and you get NaN. Replace
df.iloc[idx_next:idx_next+1,:]
with
df.iloc[idx_next]
and
df.iloc[idx:idx+1,:]
with
df.iloc[idx]
everywhere, it will work as expected (because they’re now Series objects, so the indices match).
You can also repeat the array of every fifth row of the DataFrame using np.repeat on axis=0, then element-wise divide it with the resulting array:
out = df.div(np.repeat(df[::5].to_numpy(), 5, axis=0))
Output:
A B C D
0 1.000000 1.000000 1.000000 1.000000
1 0.726190 0.359375 0.967742 1.644068
2 0.130952 0.046875 0.161290 0.406780
3 0.488095 0.312500 0.919355 0.305085
4 0.857143 0.203125 0.967742 0.525424
.. ... ... ... ...
95 1.000000 1.000000 1.000000 1.000000
96 0.061224 1.400000 0.518519 0.882353
97 1.510204 1.300000 1.740741 5.588235
98 0.224490 2.100000 1.407407 0.294118
99 1.061224 1.400000 1.388889 3.411765
[100 rows x 4 columns]