Using dataframe I have created columns. I need the five largest numbers among SNR for the corresponding ID. For e.g for J07451689+2804046 I need five largest and it should continue to the next ID which is J05062845+7149258. I’m trying to slice it but I’m stuck here:
import pandas as pd
df = pd.read_table("Ident_new_test.txt",sep=",",header=None)
print(df)
df.shape
for i in range(0,N):
df1 = df.loc[:, "1":"1":2]
print(df1)´´´
ID SNR
J07451689+2804046 200
J07451689+2804046 217
J07451689+2804046 257
J07451689+2804046 200
J07451689+2804046 181
J07451689+2804046 206
J07451689+2804046 198
J07451689+2804046 222
J05062845+7149258 281
J05062845+7149258 397
J15170588+7149258 431
J15170588+7149258 347
J15170588+7149258 411
J15170588+7149258 495
J18255915+6533486 257
J18255915+6533486 317
J18255915+6533486 349
J18255915+6533486 321
J18255915+6533486 403
J18255915+6533486 332
J19420540+5029382 328
J19420540+5029382 305
J19420540+5029382 721
J19420540+5029382 350 ´´´
>Solution :
IIUC,
out = df.groupby('ID')['SNR'].nlargest(5).reset_index('ID')
print(out)
# Output
ID SNR
9 J05062845+7149258 397
8 J05062845+7149258 281
2 J07451689+2804046 257
7 J07451689+2804046 222
1 J07451689+2804046 217
5 J07451689+2804046 206
0 J07451689+2804046 200
13 J15170588+7149258 495
10 J15170588+7149258 431
12 J15170588+7149258 411
11 J15170588+7149258 347
18 J18255915+6533486 403
16 J18255915+6533486 349
19 J18255915+6533486 332
17 J18255915+6533486 321
15 J18255915+6533486 317
22 J19420540+5029382 721
23 J19420540+5029382 350
20 J19420540+5029382 328
21 J19420540+5029382 305
Note: if you want to keep your index ordered, append .sort_index() or sort_index(ignore_index=True) after reset_index('ID').