Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extract characters between semicolons in r

Trying to extract data between semicolons and put that data into new columns.

Here is some data

df <- data.frame(data = c("a;;c;d", "a;b;;d","a;;;d","a;b;;;"), num =c(1:4))

Here is what I have scraped together so far from S.O.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

res <- df %>% 
  mutate(
    colA = str_extract(data, "^[^;]*(?=;)"),
    colB = str_extract(data, "(?<=;)[^;]*(?=;)"),
    colC = str_extract(data, "(?<=;)(?<=;)[^;]*(?=;)"),
    colD = str_extract(data, "(?<=;)[^;]*$")
  ) 

It nearly does what I want but colC is the same as colB. I dont really understand regex so a solution and a explanation would be gratefully received.

>Solution :

base R

cbind(df, read.csv2(text = df$data, header = FALSE))
#     data num V1 V2 V3 V4 V5
# 1 a;;c;d   1  a     c  d NA
# 2 a;b;;d   2  a  b     d NA
# 3  a;;;d   3  a        d NA
# 4 a;b;;;   4  a  b       NA

dplyr

library(dplyr)
df %>%
  mutate(read.csv2(text = data, header = FALSE))
#     data num V1 V2 V3 V4 V5
# 1 a;;c;d   1  a     c  d NA
# 2 a;b;;d   2  a  b     d NA
# 3  a;;;d   3  a        d NA
# 4 a;b;;;   4  a  b       NA

This works without explicit assignment because mutate (and summarize) will happily take a named-list (of which data.frame is a special — and compatible — case).

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading