filter dataframe by rule from rows and columns

April 14, 2023

I got a xlsx file, data distributed with some rule. I need collect data base on the rule. e.g. valid data begin row is "y3", data row is the cell below that row.

In below sample,

import pandas as pd

data1 = [
    ["A","y3","y2","y3","y4"],
    ["B",0,2,3,3],
    ["C","y3","y4","y5","y6"],
    ["D",2,4,5,0],
    ["E","y1","y2","y3","y4"],
    ["F",0,2,4,3],
    ]

df1 = pd.DataFrame(data1,columns=['C1','C2','C3','C4','C5'])
print(df1)

expected output:

:   C1  C2  C3  C4  C5
: 0  A  y3  y2  y3  y4
: 1  B   0   2   3   3
: 2  C  y3  y4  y5  y6
: 3  D   2   4   5   0
: 4  E  y1  y2  y3  y4
: 5  F   0   2   4   3

:    v1  y3  
: 0  B  0 
: 0  B  3 
: 1  D  2
: 1  F  4

Since 3 and 4 followed by y3 in column C4 and 0,2 followed by y3 in column C2 as well.

>Solution :

You can use:

tmp = df1.set_index('C1', append=True)

out = (tmp[tmp.shift().eq('y3')]
       .stack().droplevel(-1)
       .rename_axis([None, 'v1'])
       .reset_index(level=1, name='v3')
      )

Output: