We are attempting to split 2 dataframe columns using the last occurrence of either a + or – sign, splitting the string immediately before the sign.
What we have
zed = data.frame(
value = c('+23-25', '-23.5+25.5', '-24+24', '-3-6'),
label = c('p34-150', 'p 243-130', 'p215 +115', 'p215 +115'),
stringsAsFactors = FALSE
)
What we are going for
output = data.frame(
value1 = c('+23', '-23.5', '-24', '-3'),
value2 = c('-25', '+25.5', '+24', '-6'),
label1 = c('p34', 'p 243', 'p215', 'p215'),
label2 = c('-150', '-130', '+115', '+115'),
stringsAsFactors = FALSE
)
separate(value, into = c("value1", "value2"), sep = "+|-") doesn’t quite work, +|- is an invalid regular expression. "(?=[+-][^+-]*$)[+-]" not working as intended either, returns empty columns. At the surface, it seemed like this would be simple but we are struggling.
Edit: perhaps this requires a separate posting, but we have a weird issue with an irregular space, causing the good solution below to not work:
> zed$Total[1]
[1] "O 239.5−110"
> head(zed %>% separate(Total, into = c("label1", "label2"), sep = "(?=[\\+-][^\\+-]*$)"), 1)
label1 label2
1 O 239.5−110 <NA>
Warning message:
Expected 2 pieces. Missing pieces filled with `NA` in 12 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12].
> gsub(' ', '', zed$Total[1])
[1] "O 239.5−110"
Cannot gsub away the space after the O, for some reason this causes the separate() to not work.
>Solution :
You were close with your last attempt; you just need to escape the + and don’t need the [+-] at the end.
library(tidyr)
zed %>%
separate(
value,
into = c("value1", "value2"),
sep = "(?=[\\+-][^\\+-]*$)"
) %>%
separate(
label,
into = c("label1", "label2"),
sep = "(?=[\\+-][^\\+-]*$)"
)
value1 value2 label1 label2
1 +23 -25 p34 -150
2 -23.5 +25.5 p 243 -130
3 -24 +24 p215 +115
4 -3 -6 p215 +115