Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Sorting a table in R using only a part of a string in a column

Data:

chr10:10003219_A_G  LoF  
chr14:983281_T_C    Missense
chr1:1283721_A_G    Splice
chr21:198727614_T_C Missense
chrX:123212_T_CA    LOF
chr1:12309123_GG_C  Missense

Desired output:

chr1:1283721_A_G    Splice 
chr1:12309123_GG_C  Missense   
chr10:10003219_A_G  LoF  
chr14:983281_T_C    Missense
chr21:198727614_T_C Missense
chrX:123212_T_CA    LOF

I have tried: df[order(df$V1),]

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

But this puts chr10 before any of the others.

I would like them from chr1,chr2….chX and then by position which comes after the colon in ascending order.
I have to keep the table in the same format for downstream analysis so can’t string split into different columns.

Any help would be appreciated.

>Solution :

Use str_order with numeric = TRUE:

library(stringr)
df[str_order(df$V1, numeric = TRUE), ]
                   V1       V2
3    chr1:1283721_A_G   Splice
6  chr1:12309123_GG_C Missense
1  chr10:10003219_A_G      LoF
2    chr14:983281_T_C Missense
4 chr21:198727614_T_C Missense
5    chrX:123212_T_CA      LOF
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading