This is my DataFrame:
import pandas as pd
df = pd.DataFrame(
{
'a': [101, 90, 11, 120, 1]
}
)
And this is the output that I want. I want to create column y:
a y
0 101 101.0
1 90 101.0
2 11 90.0
3 120 120.0
4 1 120.0
Basically, values in a are compared with their previous value, and the greater one is selected.
For example for row 1, 90 is compared with 101. 101 is greater so it is selected.
I have done it in this way:
df['x'] = df.a.shift(1)
df['y'] = df[['a', 'x']].max(axis=1)
Is there a cleaner or some kind of built-in way to do it?
>Solution :
You can use np.fmax to get the maxima without creating an additional column:
df["y"] = np.fmax(df["a"], df["a"].shift(1))
This outputs:
a y
0 101 101.0
1 90 101.0
2 11 90.0
3 120 120.0
4 1 120.0
We use np.fmax() to ignore the NaN created when shifting df["a"].