Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R, split data frame column into 2 on last occurrence of either a + or a – sign

We are attempting to split 2 dataframe columns using the last occurrence of either a + or – sign, splitting the string immediately before the sign.

What we have

zed = data.frame(
  value = c('+23-25', '-23.5+25.5', '-24+24', '-3-6'),
  label = c('p34-150', 'p 243-130', 'p215 +115', 'p215 +115'),
  stringsAsFactors = FALSE
)

What we are going for

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

output = data.frame(
  value1 = c('+23', '-23.5', '-24', '-3'),
  value2 = c('-25', '+25.5', '+24', '-6'),
  label1 = c('p34', 'p 243', 'p215', 'p215'),
  label2 = c('-150', '-130', '+115', '+115'),
  stringsAsFactors = FALSE
)

separate(value, into = c("value1", "value2"), sep = "+|-") doesn’t quite work, +|- is an invalid regular expression. "(?=[+-][^+-]*$)[+-]" not working as intended either, returns empty columns. At the surface, it seemed like this would be simple but we are struggling.

Edit: perhaps this requires a separate posting, but we have a weird issue with an irregular space, causing the good solution below to not work:

> zed$Total[1]
[1] "O 239.5−110"
> head(zed %>% separate(Total, into = c("label1", "label2"), sep = "(?=[\\+-][^\\+-]*$)"), 1)
        label1  label2
1  O 239.5−110    <NA>
Warning message:
Expected 2 pieces. Missing pieces filled with `NA` in 12 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]. 
> gsub(' ', '', zed$Total[1])
[1] "O 239.5−110"

Cannot gsub away the space after the O, for some reason this causes the separate() to not work.

>Solution :

You were close with your last attempt; you just need to escape the + and don’t need the [+-] at the end.

library(tidyr)

zed %>%
  separate(
    value, 
    into = c("value1", "value2"), 
    sep = "(?=[\\+-][^\\+-]*$)"
  ) %>%
  separate(
    label, 
    into = c("label1", "label2"), 
    sep = "(?=[\\+-][^\\+-]*$)"
  ) 
  value1 value2 label1 label2
1    +23    -25    p34   -150
2  -23.5  +25.5  p 243   -130
3    -24    +24  p215    +115
4     -3     -6  p215    +115
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading