Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extracting information between special characters in a column in R

I’m sorry because I feel like versions of this question have been asked many times, but I simply cannot find code from other examples that works in this case. I have a column where all the information I want is stored in between two sets of "%%", and I want to extract this information between the two sets of parentheses and put it into a new column, in this case called df$empty.

This is a long column, but in all cases I just want the information between the sets of parentheses. Is there a way to code this out across the whole column?

To be specific, I want in this example a new column that will look like "information", "wanted".

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel


empty <- c('NA', 'NA')
information <- c('notimportant%%information%%morenotimportant', 'ignorethis%%wanted%%notthiseither')

df <- data.frame(information, empty)

>Solution :

In this case you can do:

df$empty <- sapply(strsplit(df$information, '%%'), '[', 2)

#                                   information       empty
# 1 notimportant%%information%%morenotimportant information
# 2           ignorethis%%wanted%%notthiseither      wanted

That is, split the text by '%%' and take second elements of the resulting vectors.

Or you can get the same result using sub():

df$empty <- sub('.*%%(.+)%%.*', '\\1', df$information)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading