Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extract matching words from strings in order

If I have two strings that look like this:

x <- "Here is a test of words and stuff."
y <- "Here is a better test of words and stuff."

Is there an easy way to check the words from left to right and create a new string of matching words and then stop when the words no longer match so the output would look like:

> "Here is a"

I don’t want to find all matching words between the two strings but rather just the words that match in order. So "words and stuff." is in both string but I don’t want that to be selected.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You could write a helper function to do the check for you

common_start<-function(x, y) {
  i <- 1
  last <- NA
  while (i <= nchar(x) & i <= nchar(x)) {
    if (substr(x,i,i) == substr(y,i,i)) {
      if (grepl("[[:space:][:punct:]]", substr(x,i,i), perl=T)) {
        last <- i
      }
    } else {
      break;
    }
    i <- i + 1
  }
  if (!is.na(last)) {
    substr(x, 1, last-1)
  } else {
    NA
  }
}

and use that with your sample stirngs

common_start(x,y)
# [1] "Here is a"

The idea is to check every character, keeping track of the last non-word character that still matches.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading