This is my DataFrame:
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
'a': [np.nan, np.nan, np.nan, 20, 12, 42, 33, 32, 31],
'b': [np.nan, np.nan, np.nan, np.nan, 2333, np.nan, np.nan, 12323, np.nan]
}
)
Mask is:
mask = (
(df.a.notna()) &
(df.b.notna())
)
Expected output: Slicing df up to the first instance of mask. Note that the first row of the mask is INCLUDED:
a b
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 20.0 NaN
4 12.0 2333.0
This first instance of the mask is row 4. So slicing it up to this index is the goal.
These are my attempts. The first one works, but I am not sure if the approach is correct:
# attempt 1
idx = df.loc[mask.cumsum().eq(1) & mask].index[0]
df = df.loc[:idx]
print(df)
# attempt 2
out = df[~mask.cummax()]
>Solution :
Add DataFrame.shift to your second solution:
out = df[~mask.shift(fill_value=False).cummax()]
print (out)
a b
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 20.0 NaN
4 12.0 2333.0