Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Subsetting text based on differences between two vectors in R

I have two vectors with terms as follows in R:

A <- data.frame(c("Absolute Value", "absolute deviation", "acceptance line ; acceptance boundary", "age-adjusted rate", "variance", "modified mean ; modified arithmetic mean ; trimmed mean ", "standard error (stdev)"))
B  <- data.frame(c("descriptive", "Acceptance Boundary", "deviation", "stdev", "modified arithmetic mean", "mutability"))

I want to compare the two vectors and create a vector C with the terms of vector B that are not in the vector A. I want the code to ignore the capital letters, i.e. to recognise that Acceptance Boundary and acceptance boundary is the same and if the term appears in more than one way (;), e.g., (a) acceptance line ; acceptance boundary, or (b) "standard error (stdev)" and "stdev" to recognise it as the same.

I want the final result to be:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

C <- data.frame(c("descriptive", "deviation", "mutability")) 

In a similar question (enter link description here), Chris provided a solution, however I couldn’t adjust my code properly in order to make it work in this question’s case.

>Solution :

If A and B are vectors (not data frames as in your example), then you can use strsplit() and other helper functions like (tolower() and trimws()) to separate the values of A into separate words/concepts. Then use setdiff() to find the differences between B and your cleaned set of words/concepts:

Avals = gsub("\\)", "", trimws(tolower(unlist(strsplit(A,"( ; )|( \\()")))))
setdiff(trimws(tolower(B)),Avals)               

Output:

"descriptive" "deviation"   "mutability" 

Input:

A = c("Absolute Value", "absolute deviation", "acceptance line ; acceptance boundary", 
"age-adjusted rate", "variance", "modified mean ; modified arithmetic mean ; trimmed mean ", 
"standard error (stdev)")

B = c("descriptive", "Acceptance Boundary", "deviation", "stdev", 
"modified arithmetic mean", "mutability")
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading