Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Dataframing when the numbers are repeated

a) How to find the largest five SNRs when the ID and SNRs are repeated? And also I want all these three columns as the output.
b) I also want the eliminated lines as the output.

         FIT             ID                   SNR
    1011563.fit,  J16142485-3141000 ,       36   
    1011729.fit,  J17210134-3757437 ,       18   
    1011730.fit,  J17210134-3757437 ,       20   
    1011731.fit,  J17210134-3757437 ,       20   
    1011732.fit,  J17210134-3757437 ,       13   
    1011914.fit,  J17210134-3757437 ,       38   
    1011915.fit,  J17210134-3757437 ,       26   
    1011916.fit,  J17210134-3757437 ,       19   
    1011917.fit,  J17210134-3757437 ,       47   
    1011918.fit,  J17210134-3757437 ,       25 ´´´   
The result should look somewhat like this.

  Expected output for a.

          FITS                    ID  SNR
```8  1011917.fit    J17210134-3757437    47
   5  1011914.fit    J17210134-3757437    38
   0  1011563.fit    J16142485-3141000    36
   6  1011915.fit    J17210134-3757437    26
   9  1011918.fit    J17210134-3757437    25
   3  1011731.fit    J17210134-3757437   20
   2  1011730.fit    J17210134-3757437    20 ´´´

Expected output for b) 
 
```          FITS                    ID  SNR
     1  1011729.fit    J17210134-3757437    18
     6  1011915.fit    J17210134-3757437    26
     7  1011916.fit    J17210134-3757437    19
     4  1011732.fit    J17210134-3757437   13´´´

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

You can get the min of the largest values per group, then slice:

thresh = df.groupby('ID')['SNR'].nlargest(5).groupby(level=0).min()

m = df['ID'].map(thresh).le(df['SNR'])

a = df[m]

b = df[~m]

output:

# tresh
ID
J16142485-3141000    36
J17210134-3757437    20
Name: SNR, dtype: int64

# a
           FIT                 ID  SNR
0  1011563.fit  J16142485-3141000   36
2  1011730.fit  J17210134-3757437   20
3  1011731.fit  J17210134-3757437   20
5  1011914.fit  J17210134-3757437   38
6  1011915.fit  J17210134-3757437   26
8  1011917.fit  J17210134-3757437   47
9  1011918.fit  J17210134-3757437   25

# b
           FIT                 ID  SNR
1  1011729.fit  J17210134-3757437   18
4  1011732.fit  J17210134-3757437   13
7  1011916.fit  J17210134-3757437   19
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading