If I have a dataframe groupby object thats been grouped by id –
sl.no ID hour
50834 1 6
50833 1 6
50832 1 6
19188 2 9
19187 2 9
19186 2 9
13380 3 294
13379 3 294
13378 3 294
which can be created by:
data = {
'sl.no': [50834, 50833, 50832, 19188, 19187, 19186, 13380, 13379, 13378],
'ID': [1, 1, 1, 2, 2, 2, 3, 3, 3],
'hour': [6, 6, 6, 9, 9, 9, 294, 294, 294]
}
df = pd.DataFrame(data)
dfnew = df.sort_values(['sl.no'],ascending=False).groupby('ID').head(3)
I want to extract the nth row of each group, so for example if I wanted the 2nd row the output would be-
sl.no ID hour
50833 1 6
19187 2 9
13379 3 294
I tried using iloc(2) which I found out wasn’t possible. I found a convoluted way to do this by getting head(2) and finding the difference with head(1) but is there an easier / direct way to do this?
>Solution :
Code
boolean indexing by groupby + cumcount
out = df[df.groupby('ID').cumcount().eq(1)]
out:
sl.no ID hour
1 50833 1 6
4 19187 2 9
7 13379 3 294
or you can use groupby + nth
out = df.groupby('ID').nth(1)
same result
both codes will not extract anything for groups where the 2nd row does not exist.