Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Faster version of strsplit in R

I have a data such as sequence of string where text and number type alternate: e.g. VID22CAS05, TEL21XSE12 and I need to check the length of items after parsing, e.g. VID22CAS05 -> VID 22 CAS 05 => length of 4.

data<-c("VID22CAS05", "TEL21XSE12")

string_lengths<-purrr::map(data, function(x){
    x_sep<-trimws(x=gsub("(\\d+|[A-Za-z]+)", "\\1 ", x)), which="both"
    length<-strsplit(x_sep, " ")[[1]]
})

This works fine but the problem is that this is very slow for huge dataset.

Is there any way, how to speed this up?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Will this do?

lengths(gregexpr('\\d+|[a-zA-Z]+', data))
# [1] 4 4
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading