I have a dataset look like this. (I’m looking for a dataframe solution)
df = ({'id':["a","a","a","a","b","b","b"],
'tail_num' :[2,2,2,2,1,1,1],
'value':[1,2,3,4,5,6,7]})
df = pd.DataFrame(df)
for subgroup with id ‘a’, i want to get latest 2 records which aligns with the tail_num column, for ‘b’ i want to get tail 1 value. What’s the best way to achieve this? Thanks!
Desired output looks like (basically get tail n values based on tail_num, and display all columns):
df = ({'id':["a","a","b"],
'tail_num' :[2,2,1],
'value':[3,4,7]})
df = pd.DataFrame(df)
>Solution :
You can GroupBy the id column and apply a custom/mapped tail :
dmap = dict(zip(df["id"], df["tail_num"]))
out = df.groupby("id", group_keys=False).apply(lambda g: g.tail(dmap[g.name]))
Or without a dictionnary :
out = df.groupby(["id", "tail_num"], group_keys=False).apply(lambda g: g.tail(x.name[1]))
Ouptut :
print(out)
id tail_num value
2 a 2 3
3 a 2 4
6 b 1 7