I have a dataset like this:
a <- c("a","b", "c", "d")
b <- c(7, 5, 4, 3)
c <- c("ABc","D", "EF", "BCEEF")
m <- data.frame(a, b, c)
my expected result is:
a1 <- c("a","a","a", "b", "c", "c", "d", "d", "d", "d")
b1 <- c(7, 7, 7,5, 4, 4, 3, 3, 3, 3)
c1 <- c("A","B", "C", "D", "E", "F", "B", "C", "EE", "F")
m1 <- data.frame(a1, b1, c1)
at the moment I developed this code:
library(tidyr)
separate_rows(m ,c, sep = "(?<=.)(?=.)")
In this way I create a row for every letter in c column, but when I have a double letter I want the two letter in the same row
like this:
How can I solve?
>Solution :
Probably you can try
m %>%
mutate(c = regmatches(c, gregexpr("(\\w)(\\1+)?", c))) %>%
unnest(c)
or
m %>%
mutate(c = str_extract_all(c, "(\\w)(\\1+)?")) %>%
unnest(c)
which gives
# A tibble: 10 × 3
a b c
<chr> <dbl> <chr>
1 a 7 A
2 a 7 B
3 a 7 c
4 b 5 D
5 c 4 E
6 c 4 F
7 d 3 B
8 d 3 C
9 d 3 EE
10 d 3 F
