Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R How to remove any two consecutive words?

How can I create a function such that one of any two consecutive words (in my case separated by an underscore) is removed without specifying the words?

## Some examples
c("ethnicity_ethnicity_selected_choice",
  "child_1_child_child_pid")
#> [1] "ethnicity_ethnicity_selected_choice" "child_1_child_child_pid"

## Output needed
c("ethnicity_selected_choice",
  "child_1_child_pid")
#> [1] "ethnicity_selected_choice" "child_1_child_pid"

Created on 2022-07-08 by the reprex package (v2.0.1)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You could try to find:

([^_]+)(?:_\1(?=_|$))*

Replace with \1, see an online demo


  • ([^_]+) – A capture group to catch 1+ non-underscore characters;
  • (?:_\1 – An non-capture group matching an underscore and a backreference to the 1st capture group;
    • (?=_|$) – A nested positive lookahead with either an underscore or end-line anchor;
    • )* – Close non-capture group and match 0+ times.

library(stringr)
v <- c("ethnicity_ethnicity_selected_choice",
  "child_1_child_child_pid")
v <- str_replace_all(v, "([^_]+)(?:_\\1(?=_|$))*", "\\1")
v

Prints:

"ethnicity_selected_choice", "child_1_child_pid"
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading