Get first and nth non-blank value per row

I have the following input dataframe:

              0     1     2     3    4
date                                  
2007-02-15  NaN -0.88  0.80   NaN  0.5
2007-02-16  0.5 -0.84   NaN  0.29  NaN
2007-02-19  NaN -0.84  0.79  0.29  NaN
2007-02-20  0.5  0.50  0.67  0.20  0.5

I need to get an output dataframe with the first and the nth (for example, third) non-blank value for each row. This is the expected output:

             1st   3rd
date                  
2007-02-15 -0.88  0.50
2007-02-16  0.50  0.29
2007-02-19 -0.84  0.29
2007-02-20  0.50  0.67

For the first value, I know I can do the following:

df2['1st'] = df.fillna(method='bfill', axis=1).iloc[:, 0]

but what can I do to find the 3rd one? Thanks

>Solution :

1rst and 3rd

g = df.stack().groupby(level=0)

out = pd.concat({'1st': g.first(), '3rd': g.nth(2)}, axis=1)

Output:

             1st   3rd
date                  
2007-02-15 -0.88  0.50
2007-02-16  0.50  0.29
2007-02-19 -0.84  0.29
2007-02-20  0.50  0.67

first and last

You can follow your initial logic by adding a further ffill and slicing the first and last column:

df.bfill(axis=1).ffill(axis=1).iloc[:, [0,-1]].set_axis(['1st', '3rd'], axis=1)

Output:

             1st   3rd
date                  
2007-02-15 -0.88  0.50
2007-02-16  0.50  0.29
2007-02-19 -0.84  0.29
2007-02-20  0.50  0.50

Another option with stack and groupby.agg:

df.stack().groupby(level=0).agg(['first', 'last'])

Output:

            first  last
date                   
2007-02-15  -0.88  0.50
2007-02-16   0.50  0.29
2007-02-19  -0.84  0.29
2007-02-20   0.50  0.50

Leave a Reply