Extract characters between semicolons in r

December 30, 2021

Trying to extract data between semicolons and put that data into new columns.

Here is some data

df <- data.frame(data = c("a;;c;d", "a;b;;d","a;;;d","a;b;;;"), num =c(1:4))

Here is what I have scraped together so far from S.O.

res <- df %>% 
  mutate(
    colA = str_extract(data, "^[^;]*(?=;)"),
    colB = str_extract(data, "(?<=;)[^;]*(?=;)"),
    colC = str_extract(data, "(?<=;)(?<=;)[^;]*(?=;)"),
    colD = str_extract(data, "(?<=;)[^;]*$")
  )

It nearly does what I want but colC is the same as colB. I dont really understand regex so a solution and a explanation would be gratefully received.

>Solution :

base R

cbind(df, read.csv2(text = df$data, header = FALSE))
#     data num V1 V2 V3 V4 V5
# 1 a;;c;d   1  a     c  d NA
# 2 a;b;;d   2  a  b     d NA
# 3  a;;;d   3  a        d NA
# 4 a;b;;;   4  a  b       NA

dplyr

library(dplyr)
df %>%
  mutate(read.csv2(text = data, header = FALSE))
#     data num V1 V2 V3 V4 V5
# 1 a;;c;d   1  a     c  d NA
# 2 a;b;;d   2  a  b     d NA
# 3  a;;;d   3  a        d NA
# 4 a;b;;;   4  a  b       NA

This works without explicit assignment because mutate (and summarize) will happily take a named-list (of which data.frame is a special — and compatible — case).