I got a xlsx file, data distributed with some rule. I need collect data base on the rule. e.g. valid data begin row is "y3", data row is the cell below that row.
In below sample,
import pandas as pd
data1 = [
["A","y3","y2","y3","y4"],
["B",0,2,3,3],
["C","y3","y4","y5","y6"],
["D",2,4,5,0],
["E","y1","y2","y3","y4"],
["F",0,2,4,3],
]
df1 = pd.DataFrame(data1,columns=['C1','C2','C3','C4','C5'])
print(df1)
expected output:
: C1 C2 C3 C4 C5
: 0 A y3 y2 y3 y4
: 1 B 0 2 3 3
: 2 C y3 y4 y5 y6
: 3 D 2 4 5 0
: 4 E y1 y2 y3 y4
: 5 F 0 2 4 3
: v1 y3
: 0 B 0
: 0 B 3
: 1 D 2
: 1 F 4
Since 3 and 4 followed by y3 in column C4 and 0,2 followed by y3 in column C2 as well.
>Solution :
You can use:
tmp = df1.set_index('C1', append=True)
out = (tmp[tmp.shift().eq('y3')]
.stack().droplevel(-1)
.rename_axis([None, 'v1'])
.reset_index(level=1, name='v3')
)
Output:
v1 v3
1 B 0
1 B 3
3 D 2
5 F 4