Trying to extract data between semicolons and put that data into new columns.
Here is some data
df <- data.frame(data = c("a;;c;d", "a;b;;d","a;;;d","a;b;;;"), num =c(1:4))
Here is what I have scraped together so far from S.O.
res <- df %>%
mutate(
colA = str_extract(data, "^[^;]*(?=;)"),
colB = str_extract(data, "(?<=;)[^;]*(?=;)"),
colC = str_extract(data, "(?<=;)(?<=;)[^;]*(?=;)"),
colD = str_extract(data, "(?<=;)[^;]*$")
)
It nearly does what I want but colC is the same as colB. I dont really understand regex so a solution and a explanation would be gratefully received.
>Solution :
base R
cbind(df, read.csv2(text = df$data, header = FALSE))
# data num V1 V2 V3 V4 V5
# 1 a;;c;d 1 a c d NA
# 2 a;b;;d 2 a b d NA
# 3 a;;;d 3 a d NA
# 4 a;b;;; 4 a b NA
dplyr
library(dplyr)
df %>%
mutate(read.csv2(text = data, header = FALSE))
# data num V1 V2 V3 V4 V5
# 1 a;;c;d 1 a c d NA
# 2 a;b;;d 2 a b d NA
# 3 a;;;d 3 a d NA
# 4 a;b;;; 4 a b NA
This works without explicit assignment because mutate (and summarize) will happily take a named-list (of which data.frame is a special — and compatible — case).