How to get the first line of a group in Python

April 17, 2022

New to Python.

I have 2 dataframes

The first is called df_my
and it looks like this

ICD   Distance  ID   Grp
-------------------------
AB    0         1     1
JG    0         4     1
NH    1         5     2
UY    1         7     1
SS    2         9     1
ED    2         14    2
KK    2         15    3
QR    3         18    3
XZ    3         19    1
FD    4         22    1
LN    5         24    4
TT    5         31    2

the other dataframe called df_Map and looks like this

0    1
--------
JG   838
LN   228
SS   699
FD   100
ED   483
QR   594
AB   349
NH   777
KK   505
TT   866

What I am trying to do is this

First, to get the first record of each group from df_my ordered by id or records with distance bigger than 3

so it should become like this

ICD   Distance  ID   Grp
-------------------------
AB    0         1     1   <== First of its group
NH    1         5     2   <== First of its group
KK    2         15    3   <== First of its group
FD    4         22    1   <== Distance > 3
LN    5         24    4   <== Distance > 3
TT    5         31    2   <== Distance > 3

after that I want to match it with df_Map

so it will be

ICD   Distance  ID   Grp    Map
--------------------------------
AB    0         1     1     349
NH    1         5     2     777
KK    2         15    3     505
FD    4         22    1     100
LN    5         24    4     228
TT    5         31    2     866

I wonder how to do that

I started with this

df_Final = df_my(Distance>3)

but I did not know how to start with the first record in the group?

>Solution :

Assuming pandas, you can use masks for boolean indexing, then merge:

m1 = df_my.groupby('Grp').cumcount().eq(0)
m2 = df_my['Distance'].gt(3)

out = df_my[m1|m2].merge(df_Map.set_axis(['ICD', 'Map'], axis=1),
                         on='ICD')

Output:

  ICD  Distance  ID  Grp  Map
0  AB         0   1    1  349
1  NH         1   5    2  777
2  KK         2  15    3  505
3  FD         4  22    1  100
4  LN         5  24    4  228
5  TT         5  31    2  866