How to split a dataframe into a list of dataframes with a staggered delay?

Advertisements

If I were to have a dataframe example as follows:

import numpy as np
import pandas as pd 
    
df = pd.DataFrame(np.random.randint(0,100,size=(15, 4)), columns=list('ABCD'))

     A   B   C   D
0   91  96  36  89
1   17  18  40  97
2   38  12  22  63
3   38  13  17  96
4   48  68  65  59
5   45  28  65  79
6   49  73  36  20
7    6  19  11  87
8   90  19  49  74
9   93  35  97  55
10  28  80  27  40
11  74  42  14  26
12  81  12  28  53
13  63  63  60  61
14  10  54  39  23

And I wanted to split it into a list of equal size dataframes with a staggered delay that increases each time, as in:

    A   B   C   D
0  91  96  36  89
1  17  18  40  97
2  38  12  22  63
3  38  13  17  96
     
    A   B   C   D
5  45  28  65  79
6  49  73  36  20
7   6  19  11  87
8  90  19  49  74 
    
     A   B   C   D
11  74  42  14  26
12  81  12  28  53
13  63  63  60  61
14  10  54  39  23

What would be an elegant way of doing so? I am envisioning creating an extra column with a certain value at the rows in which I would like to make the splits, but this seems a bit clunky and kind of hack-job-y. Any ideas?

Thank you.

>Solution :

The so called "delay" is given by the counter in this example

num_rows = 4
n = len(df) // num_rows 
dfs = []
counter = 0
for i in range(n):
    counter += i
    start = num_rows * i + counter
    _df = df.loc[start:start+num_rows-1]
    dfs.append(_df)
dfs

Leave a ReplyCancel reply