I have a dataframe with three columns. I want to build an array from this dataframe. I want to use a window to move across the rows. I want to start from row 0 until end, and put the value of three columns in one row of array based on the column b in data frame.
For example, with window size of 3, I put the 1,2,3 which are the three value in the first window and in column a put with the three value in column c, -1,1,5 and at the last column the value of column b is 1. This process will contintue until the column b changed. Then, for the next value of column b we have the same process. Thank you in advance
Here is the dataframe which I have:
import pandas as pd
df = pd.DataFrame()
df ['a'] = [1,2,3,4,5,6,7,8,9,10, 1,2,3]
df ['b'] = [1,1,1,1,2,2,2,2,7,7,7,7,7]
df ['c'] = [-1,1,5,6,7,3,1,6,8,3,9,10,1]
df
And here is the image of dataframe:
The array which I want is:
I tried, but I think that I am faar away from a solution.
w = 3
A = np.zeros((w,len(df)-w+1))
for i in range(len(df)-w+1):
A[:,i] = df.iloc[i:w+i,0]
A = A.T
>Solution :
IIUC, you could use a GroupBy.apply and a bit of numpy:
# group by consecutive b
group = df['b'].diff().abs().gt(0).cumsum()
# or if simply groupby by 'b' overall (not consecutive)
# group = 'b'
def flatten_group(g):
# pandas' rolling only aggregates, doing a custom rolling
# that flattens the columns a & c and add the 1st value of b
return [np.r_[g.iloc[i:i+3][['a', 'c']].to_numpy().ravel('F'),
[g['b'].iloc[0]]]
for i in range(len(g)-2)]
# apply per group and stack as numpy array
a = np.vstack(df.groupby(group).apply(flatten_group).explode())
output:
array([[ 1, 2, 3, -1, 1, 5, 1],
[ 2, 3, 4, 1, 5, 6, 1],
[ 5, 6, 7, 7, 3, 1, 2],
[ 6, 7, 8, 3, 1, 6, 2],
[ 9, 10, 1, 8, 3, 9, 7],
[10, 1, 2, 3, 9, 10, 7],
[ 1, 2, 3, 9, 10, 1, 7]])

