Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I split words before and after parenthesis in R?

I’m trying to split a text variable that goes like this:

text = "name name name (1235-23-532)"

to something like this:

name = "name name name"
num = "1235-23-532"

I’m trying this code:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df_split <- df %>%
  separate(owners, 
       into = c("name", "num"), 
       sep = "(?<=[A-Za-z])(?=\\()"
  )

However, it results in the number counterpart being NA. I’m confused how it doesn’t detect parenthesis (I tried both ( and \( and it doesn’t work either way). Is there a good solution for this?

Also: there are some rows that has two parentheses pairs like: "name name name (name) (number)" – any good way to extract just the numbers?

Thank you very much.

>Solution :

Here is one way how to get your desired output:

library(tidyverse)

as_tibble(text) %>% 
  mutate(name = str_trim(gsub("[^a-zA-Z]", " ", value)),
         num = str_extract(value, '\\d+\\-\\d+\\-\\d+'), .keep="unused")
# A tibble: 1 x 2
  name           num        
  <chr>          <chr>      
1 name name name 1235-23-532

OR:

library(tidyverse)

as_tibble(text) %>% 
  separate(value, c("name", "num"), sep = ' \\(') %>% 
  mutate(num = str_remove(num, '\\)'))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading