Home What is the best way to slice a dataframe including the the first instance of a mask?

Questions

What is the best way to slice a dataframe including the the first instance of a mask?

April 12, 2024

This is my DataFrame:

import pandas as pd
import numpy as np
df = pd.DataFrame(
    {
        'a': [np.nan, np.nan, np.nan, 20, 12, 42, 33, 32, 31],
        'b': [np.nan, np.nan, np.nan, np.nan, 2333, np.nan, np.nan, 12323, np.nan]
    }
)

Mask is:

mask = (
    (df.a.notna()) &
    (df.b.notna())
)

Expected output: Slicing df up to the first instance of mask. Note that the first row of the mask is INCLUDED:

      a        b
0   NaN      NaN
1   NaN      NaN
2   NaN      NaN
3  20.0      NaN
4  12.0   2333.0

This first instance of the mask is row 4. So slicing it up to this index is the goal.

These are my attempts. The first one works, but I am not sure if the approach is correct:

# attempt 1
idx = df.loc[mask.cumsum().eq(1) & mask].index[0]
df = df.loc[:idx]
print(df)
# attempt 2
out = df[~mask.cummax()]

>Solution :

Add DataFrame.shift to your second solution:

out = df[~mask.shift(fill_value=False).cummax()]
print (out)
      a       b
0   NaN     NaN
1   NaN     NaN
2   NaN     NaN
3  20.0     NaN
4  12.0  2333.0