I have the following input dataframe:
0 1 2 3 4
date
2007-02-15 NaN -0.88 0.80 NaN 0.5
2007-02-16 0.5 -0.84 NaN 0.29 NaN
2007-02-19 NaN -0.84 0.79 0.29 NaN
2007-02-20 0.5 0.50 0.67 0.20 0.5
I need to get an output dataframe with the first and the nth (for example, third) non-blank value for each row. This is the expected output:
1st 3rd
date
2007-02-15 -0.88 0.50
2007-02-16 0.50 0.29
2007-02-19 -0.84 0.29
2007-02-20 0.50 0.67
For the first value, I know I can do the following:
df2['1st'] = df.fillna(method='bfill', axis=1).iloc[:, 0]
but what can I do to find the 3rd one? Thanks
>Solution :
1rst and 3rd
g = df.stack().groupby(level=0)
out = pd.concat({'1st': g.first(), '3rd': g.nth(2)}, axis=1)
Output:
1st 3rd
date
2007-02-15 -0.88 0.50
2007-02-16 0.50 0.29
2007-02-19 -0.84 0.29
2007-02-20 0.50 0.67
first and last
You can follow your initial logic by adding a further ffill
and slicing the first and last column:
df.bfill(axis=1).ffill(axis=1).iloc[:, [0,-1]].set_axis(['1st', '3rd'], axis=1)
Output:
1st 3rd
date
2007-02-15 -0.88 0.50
2007-02-16 0.50 0.29
2007-02-19 -0.84 0.29
2007-02-20 0.50 0.50
Another option with stack
and groupby.agg
:
df.stack().groupby(level=0).agg(['first', 'last'])
Output:
first last
date
2007-02-15 -0.88 0.50
2007-02-16 0.50 0.29
2007-02-19 -0.84 0.29
2007-02-20 0.50 0.50