Function that retuns a dataframe without leading 0s of a specific column

March 1, 2023

I have the following dataframe:

df=pd.DataFrame({
        'n' : [0,1,2,3, 0,1,2, 0,1,2],
    'col1' : ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C'],
    'col2' : [0, 0, 0, 0, 3.3, 0, 4, 1.94, 0, 6.17]
    })

It has the form:

    n   col1    col2
0   0   A   0.00
1   1   A   0.00
2   2   A   0.00
3   3   B   0.00
4   0   B   3.30
5   1   B   0.00
6   2   B   4.00
7   0   C   1.94
8   1   C   0.00
9   2   C   6.17

I want a function that will have that dataframe as argument and will return a new dataframe without the first rows where values are 0s in the column ‘col2’

My code

def remove_lead_zeros(df):
   new_df = df[df['col2'] != 0]
   return new_df

My function removes all rows having 0.0 values while I want to remove only the all first ones,

Goal

Is to get the following dataframe as result:

    n   col1    col2
0   0   B     3.30
1   1   B     0.00
2   2   B     4.00
3   0   C     1.94
4   1   C     0.00
5   2   C     6.17

Any help from your side will be highly appreciated (Upvoting all answers), thank you !

>Solution :

Use groupby.cummax on the boolean series of non-zero col2 values and boolean indexing:

out = df[df['col2'].ne(0).groupby(df['col1']).cummax()]

Output:

   n col1  col2
4  0    B  3.30
5  1    B  0.00
6  2    B  4.00
7  0    C  1.94
8  1    C  0.00
9  2    C  6.17

Intermediates to understand the logic:

   n col1  col2  ne(0)  groupby.cummax
0  0    A  0.00  False           False
1  1    A  0.00  False           False
2  2    A  0.00  False           False
3  3    B  0.00  False           False
4  0    B  3.30   True            True
5  1    B  0.00  False            True
6  2    B  4.00   True            True
7  0    C  1.94   True            True
8  1    C  0.00  False            True
9  2    C  6.17   True            True