| Chromosome_name | Start Position |
|---|---|
| CHR_HSCHR7_2_CTG6 | 142857940 |
| CHR_HSCHR19LRC_PGF2_CTG3_1 | 54316049 |
I have just started to use R.
I have a data frame of chromosome names but I just want to replace the long names with the number of the chromosome.
i.e CHR_HSCHR19LRC_PGF2_CTG3_1 would be "19"
I need to replace the long name with the number just after the characters "HRCHR"
How would I do this?
I tried the method of manually entry the replacement value:
gsub(".*HSCHR19", "19", dataframe)
But this takes far too long for a list of >100 values. I would like to find a way to do this automatically.
>Solution :
You can use
sub('^.*CHR(\\d+).*$', '\\1', Chromosome_name)
#> [1] "7" "19"