Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Split a column into two, using parenthesis as separator in R

I have a weird data format and I need to split a column to two.

col=c("142343-2344343(+)", "546354-4775458(-)", "374637463")

I want to split col to col1 and col2, using the first parenthesis as separator.

I want something like this

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

     col1                 col2       
142343-2344343            +
546354-4775458            _
374637463                  NA

I d love your help!

>Solution :

We may use base R with read.csv

read.csv(text = sub("(.*)([+-])$", "\\1,\\2", 
gsub("\\(|\\)", "", col)), header = FALSE, na.strings= "", 
col.names = c("col1", "col2"))

-output

             col1 col2
1 142343-2344343    +
2 546354-4775458    -
3      374637463 <NA>

With tidyr, an option is

library(tidyr)
library(dplyr)
library(tibble)
tibble(col) %>% 
 separate_wider_regex(col, c(col1 = ".*", "\\(", var2 = "[^)]", 
    "\\)"), too_few = "align_start")

-output

# A tibble: 3 × 2
  col1           var2 
  <chr>          <chr>
1 142343-2344343 +    
2 546354-4775458 -    
3 374637463      <NA> 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading