Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Split string by vector of numbers

I have strings of unequal length such as:

  x <- c("11333333444A", "3aaa0085hb", "&ffvyß")

I want to break the strings in x into substrings based on the numeric information stored in a vector y:

  y <- c(2, 8, 11, 12)

to obtain the first 2 characters, then the following characters up until position 8 character, then up until position 11 character, and finally up until 12:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

 11, 333333, 444, A
 3a, aa0085, hb
 &f, fvyß

I’ve tried to use str_locate_all, str_which, and others from stringr but couod not figure out a solution.

>Solution :

With split positions given, an easier option is separate where the sep can take index of vectors as splitting delimiter

library(tibble)
library(stringr)
library(tidyr)
tibble(x) %>%
   separate(x, into = str_c('col', seq_along(y)), sep = y)

-output

# A tibble: 3 × 4
  col1  col2   col3  col4 
  <chr> <chr>  <chr> <chr>
1 11    333333 "444" "A"  
2 3a    aa0085 "hb"  ""   
3 &f    fvyß   ""    ""   

Or use base R with read.fwf and specify the widths by taking the difference of position index

read.fwf(textConnection(x), widths = c(y[1], diff(y)))
  V1     V2   V3   V4
1 11 333333  444    A
2 3a aa0085   hb <NA>
3 &f   fvyß <NA> <NA>
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading