Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

What is the best way to slice a dataframe including the the first instance of a mask?

This is my DataFrame:

import pandas as pd
import numpy as np
df = pd.DataFrame(
    {
        'a': [np.nan, np.nan, np.nan, 20, 12, 42, 33, 32, 31],
        'b': [np.nan, np.nan, np.nan, np.nan, 2333, np.nan, np.nan, 12323, np.nan]
    }
)

Mask is:

mask = (
    (df.a.notna()) &
    (df.b.notna())
)

Expected output: Slicing df up to the first instance of mask. Note that the first row of the mask is INCLUDED:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

      a        b
0   NaN      NaN
1   NaN      NaN
2   NaN      NaN
3  20.0      NaN
4  12.0   2333.0

This first instance of the mask is row 4. So slicing it up to this index is the goal.

These are my attempts. The first one works, but I am not sure if the approach is correct:

# attempt 1
idx = df.loc[mask.cumsum().eq(1) & mask].index[0]
df = df.loc[:idx]
print(df)
# attempt 2
out = df[~mask.cummax()]

>Solution :

Add DataFrame.shift to your second solution:

out = df[~mask.shift(fill_value=False).cummax()]
print (out)
      a       b
0   NaN     NaN
1   NaN     NaN
2   NaN     NaN
3  20.0     NaN
4  12.0  2333.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading