Build an array from a dataframe based on a sliding window

March 9, 2022

I have a dataframe with three columns. I want to build an array from this dataframe. I want to use a window to move across the rows. I want to start from row 0 until end, and put the value of three columns in one row of array based on the column b in data frame.

For example, with window size of 3, I put the 1,2,3 which are the three value in the first window and in column a put with the three value in column c, -1,1,5 and at the last column the value of column b is 1. This process will contintue until the column b changed. Then, for the next value of column b we have the same process. Thank you in advance
Here is the dataframe which I have:

import pandas as pd
df = pd.DataFrame()
df ['a'] = [1,2,3,4,5,6,7,8,9,10, 1,2,3]
df ['b'] = [1,1,1,1,2,2,2,2,7,7,7,7,7]
df ['c'] = [-1,1,5,6,7,3,1,6,8,3,9,10,1]
df

And here is the image of dataframe:

The array which I want is:

I tried, but I think that I am faar away from a solution.

w = 3

A = np.zeros((w,len(df)-w+1))

for i in range(len(df)-w+1):
    A[:,i] = df.iloc[i:w+i,0]

A = A.T

>Solution :

IIUC, you could use a GroupBy.apply and a bit of numpy:

# group by consecutive b
group = df['b'].diff().abs().gt(0).cumsum()
# or if simply groupby by 'b' overall (not consecutive)
# group = 'b'

def flatten_group(g):
    # pandas' rolling only aggregates, doing a custom rolling
    # that flattens the columns a & c and add the 1st value of b
    return [np.r_[g.iloc[i:i+3][['a', 'c']].to_numpy().ravel('F'),
                  [g['b'].iloc[0]]]
            for i in range(len(g)-2)]

# apply per group and stack as numpy array
a = np.vstack(df.groupby(group).apply(flatten_group).explode())

output:

array([[ 1,  2,  3, -1,  1,  5,  1],
       [ 2,  3,  4,  1,  5,  6,  1],
       [ 5,  6,  7,  7,  3,  1,  2],
       [ 6,  7,  8,  3,  1,  6,  2],
       [ 9, 10,  1,  8,  3,  9,  7],
       [10,  1,  2,  3,  9, 10,  7],
       [ 1,  2,  3,  9, 10,  1,  7]])