Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Build an array from a dataframe based on a sliding window

I have a dataframe with three columns. I want to build an array from this dataframe. I want to use a window to move across the rows. I want to start from row 0 until end, and put the value of three columns in one row of array based on the column b in data frame.

For example, with window size of 3, I put the 1,2,3 which are the three value in the first window and in column a put with the three value in column c, -1,1,5 and at the last column the value of column b is 1. This process will contintue until the column b changed. Then, for the next value of column b we have the same process. Thank you in advance
Here is the dataframe which I have:

import pandas as pd
df = pd.DataFrame()
df ['a'] = [1,2,3,4,5,6,7,8,9,10, 1,2,3]
df ['b'] = [1,1,1,1,2,2,2,2,7,7,7,7,7]
df ['c'] = [-1,1,5,6,7,3,1,6,8,3,9,10,1]
df

And here is the image of dataframe:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

enter image description here

The array which I want is:

enter image description here

I tried, but I think that I am faar away from a solution.

w = 3

A = np.zeros((w,len(df)-w+1))

for i in range(len(df)-w+1):
    A[:,i] = df.iloc[i:w+i,0]

A = A.T

>Solution :

IIUC, you could use a GroupBy.apply and a bit of numpy:

# group by consecutive b
group = df['b'].diff().abs().gt(0).cumsum()
# or if simply groupby by 'b' overall (not consecutive)
# group = 'b'

def flatten_group(g):
    # pandas' rolling only aggregates, doing a custom rolling
    # that flattens the columns a & c and add the 1st value of b
    return [np.r_[g.iloc[i:i+3][['a', 'c']].to_numpy().ravel('F'),
                  [g['b'].iloc[0]]]
            for i in range(len(g)-2)]

# apply per group and stack as numpy array
a = np.vstack(df.groupby(group).apply(flatten_group).explode())

output:

array([[ 1,  2,  3, -1,  1,  5,  1],
       [ 2,  3,  4,  1,  5,  6,  1],
       [ 5,  6,  7,  7,  3,  1,  2],
       [ 6,  7,  8,  3,  1,  6,  2],
       [ 9, 10,  1,  8,  3,  9,  7],
       [10,  1,  2,  3,  9, 10,  7],
       [ 1,  2,  3,  9, 10,  1,  7]])
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading