Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to only remove rows with NaN that are not at the beginning or end of the pandas Dataframe column?

I have a pandas dataframe. It has a particular column which may or may not contain a continuous set of values as NaN’s in its starting and ending. Also, it may or may not contain NaN’s intermittently in between as well.

My objective is to eliminate only all those rows where NaN’s may intermittently be present in between.

If for example if this is my df:-

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df = pd.DataFrame({'A': [np.nan, np.nan, np.nan, np.nan, 45, 1, np.nan, 2, np.nan, 3, np.nan, 6, np.nan, np.nan, 8, 9, 15, np.nan, 18, np.nan, np.nan, np.nan, np.nan],
'B': [22,33,44,55,66,22,11,34,55,67,55,66,22,11,34,33,44,55,6,96,64,93,81]})

Then I need the output as:-

df_new = pd.DataFrame({'A': [np.nan, np.nan, np.nan, np.nan, 45, 1, 2, 3, 6, 8, 9, 15, 18, np.nan, np.nan, np.nan, np.nan],
'B': [22,33,44,55,66,22,34,67,66,34,33,44,6,96,64,93,81]})

Can you please help?

>Solution :

You can use multiple conditions for boolean indexing, with ffill/bfill to propagate the non-NA, and isna/notna:

# is the A not NA?
m1 = df['A'].notna()
# is the A an external NA?
m2 = df['A'].ffill().isna()
m3 = df['A'].bfill().isna()

out = df.loc[m1|m2|m3]

Alternative with propagation of the booleans using cummin:

m1 = df['A'].isna()
m2 = m1.cummin()
m3 = m1[::-1].cummin()

out = df.loc[(~m1)|m2|m3]

A third approach using interpolate to only fill the inner values, then comparing the NA status, if it didn’t change with interpolation we either have a non-NA or an outer NA:

out = df.loc[df['A'].isna().eq(df['A'].interpolate(limit_area='inside').isna())]

Output:

       A   B
0    NaN  22
1    NaN  33
2    NaN  44
3    NaN  55
4   45.0  66
5    1.0  22
7    2.0  34
9    3.0  67
11   6.0  66
14   8.0  34
15   9.0  33
16  15.0  44
18  18.0   6
19   NaN  96
20   NaN  64
21   NaN  93
22   NaN  81

Intermediates (first approach):

       A   B     m1     m2     m3  m1|m2|m3
0    NaN  22  False   True  False      True
1    NaN  33  False   True  False      True
2    NaN  44  False   True  False      True
3    NaN  55  False   True  False      True
4   45.0  66   True  False  False      True
5    1.0  22   True  False  False      True
6    NaN  11  False  False  False     False
7    2.0  34   True  False  False      True
8    NaN  55  False  False  False     False
9    3.0  67   True  False  False      True
10   NaN  55  False  False  False     False
11   6.0  66   True  False  False      True
12   NaN  22  False  False  False     False
13   NaN  11  False  False  False     False
14   8.0  34   True  False  False      True
15   9.0  33   True  False  False      True
16  15.0  44   True  False  False      True
17   NaN  55  False  False  False     False
18  18.0   6   True  False  False      True
19   NaN  96  False  False   True      True
20   NaN  64  False  False   True      True
21   NaN  93  False  False   True      True
22   NaN  81  False  False   True      True
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading