Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

In R, how to match string that has irregular space

We are loading data from an excel file. Have the following issue going on:

> dput(names_col[54])
" Calvin Ridley SUS"
> dput(substr(names_col[54], 15, 18))
" SUS"
> substr(names_col[54], 15, 18) == " SUS"
[1] FALSE
> zed = " Calvin Ridley SUS"
> substr(zed, 15, 18) == " SUS"
[1] TRUE

Our hypothesis is that the space in the first code block is something along the lines of an irregular space, due to the loading from excel. How can we fix this so we can match the substring in the first code block?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

It seems your string contains a "non-breaking space".

You can match using the unicode escape string:

target <- "\u00a0Calvin Ridley\u00a0SUS"
grepl("\u00a0SUS",target)
[1] TRUE

As user2554330 mentions in the comments, you can also use the raw hex codes, but it’s more convoluted:

grepl(paste0(rawToChar(as.raw(c(0xc2, 0xa0))),"SUS"),target)
[1] TRUE
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading