a) How to find the largest five SNRs when the ID is repeated? And also I want all these three columns as the output.
b) I also want the eliminated lines as the output.
```
FITS ID SNR
1004234.fits J16355032-2814188 714
1004235.fits J16355032-2814188 444
1004236.fits J16355032-2814188 331
1004237.fits J16355032-2814188 492
1004238.fits J16355032-2814188 690
1004239.fits J16355032-2814188 491
1004240.fits J16355032-2814188 489
1004241.fits J16355032-2814188 382
1004242.fits J16355032-2814188 635
1004243.fits J16355032-2814188 522
1004244.fits J16355032-2814188 385
1004245.fits J16355032-2814188 645
1004475.fits J22152631+4958343 62
1004476.fits J22152631+4958343 162
1004477.fits J22152631+4958343 76
1004478.fits J22152631+4958343 103
1005113.fits J22154748+4954052 212
1005114.fits J22154748+4954052 227
1005115.fits J22154748+4954052 148
1005116.fits J22154748+4954052 160´´´
This is the expected output for a.
``` FITS ID SNR
1004234.fits J16355032-2814188 714
1004235.fits J16355032-2814188 690
1004236.fits J16355032-2814188 645
1004237.fits J16355032-2814188 635
1004238.fits J16355032-2814188 522
1004475.fits J22152631+4958343 62
1004476.fits J22152631+4958343 162
1004477.fits J22152631+4958343 76
1004478.fits J22152631+4958343 103
1005113.fits J22154748+4954052 212
1005114.fits J22154748+4954052 227
1005115.fits J22154748+4954052 148
1005116.fits J22154748+4954052 160 ´´´
Expected output for b.
``` FITS ID SNR
1004235.fits J16355032-2814188 444
1004236.fits J16355032-2814188 331
1004237.fits J16355032-2814188 492
1004239.fits J16355032-2814188 491
1004240.fits J16355032-2814188 489
1004241.fits J16355032-2814188 382
1004243.fits J16355032-2814188 522
1004244.fits J16355032-2814188 385 ´´´
>Solution :
Use groupby+head on the sorted dataframe to get the indices, then slice:
idx = df.sort_values(by='SNR', ascending=False).groupby('ID').head(5).index
df2 = df.loc[idx]
output:
FITS ID SNR
0 1004234.fits J16355032-2814188 714
4 1004238.fits J16355032-2814188 690
11 1004245.fits J16355032-2814188 645
8 1004242.fits J16355032-2814188 635
9 1004243.fits J16355032-2814188 522
17 1005114.fits J22154748+4954052 227
16 1005113.fits J22154748+4954052 212
13 1004476.fits J22152631+4958343 162
19 1005116.fits J22154748+4954052 160
18 1005115.fits J22154748+4954052 148
15 1004478.fits J22152631+4958343 103
14 1004477.fits J22152631+4958343 76
12 1004475.fits J22152631+4958343 62
Other rows:
df3 = df.loc[df.index.difference(idx)]
output:
FITS ID SNR
1 1004235.fits J16355032-2814188 444
2 1004236.fits J16355032-2814188 331
3 1004237.fits J16355032-2814188 492
5 1004239.fits J16355032-2814188 491
6 1004240.fits J16355032-2814188 489
7 1004241.fits J16355032-2814188 382
10 1004244.fits J16355032-2814188 385