I have the following dataframe:
df=pd.DataFrame({
'n' : [0,1,2,3, 0,1,2, 0,1,2],
'col1' : ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C'],
'col2' : [0, 0, 0, 0, 3.3, 0, 4, 1.94, 0, 6.17]
})
It has the form:
n col1 col2
0 0 A 0.00
1 1 A 0.00
2 2 A 0.00
3 3 B 0.00
4 0 B 3.30
5 1 B 0.00
6 2 B 4.00
7 0 C 1.94
8 1 C 0.00
9 2 C 6.17
I want a function that will have that dataframe as argument and will return a new dataframe without the first rows where values are 0s in the column ‘col2’
My code
def remove_lead_zeros(df):
new_df = df[df['col2'] != 0]
return new_df
My function removes all rows having 0.0 values while I want to remove only the all first ones,
Goal
Is to get the following dataframe as result:
n col1 col2
0 0 B 3.30
1 1 B 0.00
2 2 B 4.00
3 0 C 1.94
4 1 C 0.00
5 2 C 6.17
Any help from your side will be highly appreciated (Upvoting all answers), thank you !
>Solution :
Use groupby.cummax on the boolean series of non-zero col2 values and boolean indexing:
out = df[df['col2'].ne(0).groupby(df['col1']).cummax()]
Output:
n col1 col2
4 0 B 3.30
5 1 B 0.00
6 2 B 4.00
7 0 C 1.94
8 1 C 0.00
9 2 C 6.17
Intermediates to understand the logic:
n col1 col2 ne(0) groupby.cummax
0 0 A 0.00 False False
1 1 A 0.00 False False
2 2 A 0.00 False False
3 3 B 0.00 False False
4 0 B 3.30 True True
5 1 B 0.00 False True
6 2 B 4.00 True True
7 0 C 1.94 True True
8 1 C 0.00 False True
9 2 C 6.17 True True