Pandas divide every nth row

January 29, 2022

I want to use every 5th row as a reference row (ref_row), divide this ref_row starting from this ref_row and do the same for the next 4 rows.

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
    
len = df.shape[0]

for idx in range(0,len,5):

    ref_row = df.iloc[idx:idx+1,:]

    for idx_next in range(idx,idx+5):

        df.iloc[idx_next:idx_next+1,:] = df.iloc[idx_next:idx_next+1,:].div(ref_row)

However, I got all NaN except the ref_row.

    A   B   C   D
0   1.0 1.0 1.0 1.0
1   NaN NaN NaN NaN
2   NaN NaN NaN NaN
3   NaN NaN NaN NaN
4   NaN NaN NaN NaN
... ... ... ... ...
95  1.0 1.0 1.0 1.0
96  NaN NaN NaN NaN
97  NaN NaN NaN NaN
98  NaN NaN NaN NaN
99  NaN NaN NaN NaN

Any idea what’s wrong?

>Solution :

The problem with your code is that with df.iloc[idx_next:idx_next+1,:] and df.iloc[idx:idx+1,:], you’re indexing df rows as DF objects. So when you divide, the indices don’t match and you get NaN. Replace

df.iloc[idx_next:idx_next+1,:]

with

df.iloc[idx_next]

and

df.iloc[idx:idx+1,:]

with

df.iloc[idx]

everywhere, it will work as expected (because they’re now Series objects, so the indices match).

You can also repeat the array of every fifth row of the DataFrame using np.repeat on axis=0, then element-wise divide it with the resulting array:

out = df.div(np.repeat(df[::5].to_numpy(), 5, axis=0))

Output:

           A         B         C         D
0   1.000000  1.000000  1.000000  1.000000
1   0.726190  0.359375  0.967742  1.644068
2   0.130952  0.046875  0.161290  0.406780
3   0.488095  0.312500  0.919355  0.305085
4   0.857143  0.203125  0.967742  0.525424
..       ...       ...       ...       ...
95  1.000000  1.000000  1.000000  1.000000
96  0.061224  1.400000  0.518519  0.882353
97  1.510204  1.300000  1.740741  5.588235
98  0.224490  2.100000  1.407407  0.294118
99  1.061224  1.400000  1.388889  3.411765

[100 rows x 4 columns]