Fast way of calculating number of consecutive nan values in a column

I want to transform my dataframe so that the new DataFrame is of the same shape where each entry represents the number of consecutive NaNs counted after its position as follows:

IN:

    A       B      
0   0.1880  0.345 
1   0.2510  0.585  
2   NaN     NaN  
3   NaN     NaN 
4   NaN     1.150  
5   0.2300  1.210  
6   0.1670  1.290  
7   0.0835  1.400  
8   0.0418  NaN    
9   0.0209  NaN    
10  NaN     NaN    
11  NaN     NaN    
12  NaN     NaN     

OUT:

    A       B      
0   0       0    
1   0       0  
2   3       2  
3   2       1 
4   1       0  
5   0       0 
6   0       0 
7   0       0 
8   0       5    
9   0       4   
10  3       3   
11  2       2 
12  1       1     

Similar question that I was trying to modify – Fast way to get the number of NaNs in a column counted from the last valid value in a DataFrame

>Solution :

Inspired from this answer https://stackoverflow.com/a/52718619/3275464

from io import StringIO
import pandas as pd

s = """    A       B      
0   0.1880  0.345 
1   0.2510  0.585  
2   NaN     NaN  
3   NaN     NaN 
4   NaN     1.150  
5   0.2300  1.210  
6   0.1670  1.290  
7   0.0835  1.400  
8   0.0418  NaN    
9   0.0209  NaN    
10  NaN     NaN    
11  NaN     NaN    
12  NaN     NaN    """

df = pd.read_csv(StringIO(s), engine='python', sep='\s+')

_df = df.isna().iloc[::-1]
b = _df.cumsum()
c = b.sub(b.mask(_df).ffill().fillna(0)).astype(int).iloc[::-1]
c #gives the output you seem to want

Leave a Reply