Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Splitting a comma- and semicolon-delimited string in R

I’m trying to split a string containing two entries and each entry has a specific format:

  • Category (e.g. active site/region) which is followed by a :
  • Term (e.g. His, Glu/nucleotide-binding motif A) which is followed by a ,

Here’s the string that I want to split:

string <- "active site: His, Glu,region: nucleotide-binding motif A,"

This is what I have tried so far. Except for the two empty substrings, it produces the desired output.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

unlist(str_extract_all(string, ".*?(?=,(?:\\w+|$))"))

[1] "active site: His, Glu"              ""                                   "region: nucleotide-binding motif A"
[4] "" 

How do I get rid of the empty substrings?

>Solution :

You get the empty strings because .*? can also match an empty string where this assertion (?=,(?:\\w+|$)) is true

You can exclude matching a colon or comma using a negated character class before matching :

[^:,\n]+:.*?(?=,(?:\w|$))

Explanation

  • [^:,\n]+ Match 1+ chars other than : , or a newline
  • : Match the colon
  • .*? Match any char as least as possbiel
  • (?= Positive lookahead, assert that what is directly to the right from the current position:
    • , Match literally
    • (?:\w|$) Match either a single word char, or assert the end of the string
  • ) Close the lookahead

Regex demo | R demo

string <- "active site: His, Glu,region: nucleotide-binding motif A,"
unlist(str_extract_all(string, "[^:,\\n]+:.*?(?=,(?:\\w|$))"))

Output

[1] "active site: His, Glu"              "region: nucleotide-binding motif A"
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading