I have the following string BM1C-18 ORF2 (ORF2) gene; and nonfunctional ORF1 (ORF1) gene for which I want to remove everything except BM1C-18. I Tried standard gsub, but this is not working due to the spaces and symbols within the string. Any ideas how to solve this?
>Solution :
A possible solution, using stringr:str_extract:
library(stringr)
string <- "BM1C-18 ORF2 (ORF2) gene; and nonfunctional ORF1 (ORF1) gene"
str_extract(string, "BM1C-18")
#> [1] "BM1C-18"
To deal with a list of strings:
library(stringr)
string <- "BM1C-18 ORF2 (ORF2) gene; and nonfunctional ORF1 (ORF1) gene"
lSrings <- list(string, string, string)
str_extract(lSrings, "BM1C-18")
#> [1] "BM1C-18" "BM1C-18" "BM1C-18"
With a column of strings in a dataframe:
library(tidyverse)
string <- "BM1C-18 ORF2 (ORF2) gene; and nonfunctional ORF1 (ORF1) gene"
df <- data.frame(col1 = rep(string, 3))
df %>%
mutate(col1 = str_extract(col1, "BM1C-18"))
#> col1
#> 1 BM1C-18
#> 2 BM1C-18
#> 3 BM1C-18