Dataframing when the numbers are repeated

March 4, 2022

a) How to find the largest five SNRs when the ID and SNRs are repeated? And also I want all these three columns as the output.
b) I also want the eliminated lines as the output.

         FIT             ID                   SNR
    1011563.fit,  J16142485-3141000 ,       36   
    1011729.fit,  J17210134-3757437 ,       18   
    1011730.fit,  J17210134-3757437 ,       20   
    1011731.fit,  J17210134-3757437 ,       20   
    1011732.fit,  J17210134-3757437 ,       13   
    1011914.fit,  J17210134-3757437 ,       38   
    1011915.fit,  J17210134-3757437 ,       26   
    1011916.fit,  J17210134-3757437 ,       19   
    1011917.fit,  J17210134-3757437 ,       47   
    1011918.fit,  J17210134-3757437 ,       25 ´´´   
The result should look somewhat like this.

  Expected output for a.

          FITS                    ID  SNR
```8  1011917.fit    J17210134-3757437    47
   5  1011914.fit    J17210134-3757437    38
   0  1011563.fit    J16142485-3141000    36
   6  1011915.fit    J17210134-3757437    26
   9  1011918.fit    J17210134-3757437    25
   3  1011731.fit    J17210134-3757437   20
   2  1011730.fit    J17210134-3757437    20 ´´´

Expected output for b) 
 
```          FITS                    ID  SNR
     1  1011729.fit    J17210134-3757437    18
     6  1011915.fit    J17210134-3757437    26
     7  1011916.fit    J17210134-3757437    19
     4  1011732.fit    J17210134-3757437   13´´´

>Solution :

You can get the min of the largest values per group, then slice:

thresh = df.groupby('ID')['SNR'].nlargest(5).groupby(level=0).min()

m = df['ID'].map(thresh).le(df['SNR'])

a = df[m]

b = df[~m]

output:

# tresh
ID
J16142485-3141000    36
J17210134-3757437    20
Name: SNR, dtype: int64

# a
           FIT                 ID  SNR
0  1011563.fit  J16142485-3141000   36
2  1011730.fit  J17210134-3757437   20
3  1011731.fit  J17210134-3757437   20
5  1011914.fit  J17210134-3757437   38
6  1011915.fit  J17210134-3757437   26
8  1011917.fit  J17210134-3757437   47
9  1011918.fit  J17210134-3757437   25

# b
           FIT                 ID  SNR
1  1011729.fit  J17210134-3757437   18
4  1011732.fit  J17210134-3757437   13
7  1011916.fit  J17210134-3757437   19