Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to get the first line of a group in Python

New to Python.

I have 2 dataframes

The first is called df_my
and it looks like this

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

ICD   Distance  ID   Grp
-------------------------
AB    0         1     1
JG    0         4     1
NH    1         5     2
UY    1         7     1
SS    2         9     1
ED    2         14    2
KK    2         15    3
QR    3         18    3
XZ    3         19    1
FD    4         22    1
LN    5         24    4
TT    5         31    2

the other dataframe called df_Map and looks like this

0    1
--------
JG   838
LN   228
SS   699
FD   100
ED   483
QR   594
AB   349
NH   777
KK   505
TT   866

What I am trying to do is this

First, to get the first record of each group from df_my ordered by id or records with distance bigger than 3

so it should become like this

ICD   Distance  ID   Grp
-------------------------
AB    0         1     1   <== First of its group
NH    1         5     2   <== First of its group
KK    2         15    3   <== First of its group
FD    4         22    1   <== Distance > 3
LN    5         24    4   <== Distance > 3
TT    5         31    2   <== Distance > 3

after that I want to match it with df_Map

so it will be

ICD   Distance  ID   Grp    Map
--------------------------------
AB    0         1     1     349
NH    1         5     2     777
KK    2         15    3     505
FD    4         22    1     100
LN    5         24    4     228
TT    5         31    2     866

I wonder how to do that

I started with this

df_Final = df_my(Distance>3)

but I did not know how to start with the first record in the group?

>Solution :

Assuming , you can use masks for boolean indexing, then merge:

m1 = df_my.groupby('Grp').cumcount().eq(0)
m2 = df_my['Distance'].gt(3)

out = df_my[m1|m2].merge(df_Map.set_axis(['ICD', 'Map'], axis=1),
                         on='ICD')

Output:

  ICD  Distance  ID  Grp  Map
0  AB         0   1    1  349
1  NH         1   5    2  777
2  KK         2  15    3  505
3  FD         4  22    1  100
4  LN         5  24    4  228
5  TT         5  31    2  866
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading