I want to process each N rows of a DataFrame separately.
If my data has 15 row indexed from 0 to 14 I want to process rows from index 0 to 3 , 4 to 7, 8 to 11, 12 to 15
for example let’s say for each 4 rows I want the sum(A) and the mean(B)
| Index | A | B |
|---|---|---|
| 0 | 4 | 4 |
| 1 | 7 | 9 |
| 2 | 9 | 3 |
| 3 | 0 | 4 |
| 4 | 7 | 9 |
| 5 | 9 | 2 |
| 6 | 3 | 0 |
| 7 | 7 | 4 |
| 8 | 7 | 2 |
| 9 | 1 | 6 |
The Resulted DataFrame should be
| Index | A | B |
|---|---|---|
| 0 | 20 | 5 |
| 1 | 26 | 3.75 |
| 2 | 8 | 4 |
TLDR: how to let DataFrame.apply takes multiple rows instead of a single row at a time
>Solution :
Use GroupBy.agg with integer division by 4 by index:
#default RangeIndex
df = df.groupby(df.index // 4).agg({'A':'sum', 'B':'mean'})
#any index
df = df.groupby(np.arange(len(df.index)) // 4).agg({'A':'sum', 'B':'mean'})
print (df)
A B
0 20 5.00
1 26 3.75
2 8 4.00