I have two vectors with terms as follows in R:
A <- data.frame(c("Absolute Value", "absolute deviation", "acceptance line ; acceptance boundary", "age-adjusted rate", "variance", "modified mean ; modified arithmetic mean ; trimmed mean ", "standard error (stdev)"))
B <- data.frame(c("descriptive", "Acceptance Boundary", "deviation", "stdev", "modified arithmetic mean", "mutability"))
I want to compare the two vectors and create a vector C with the terms of vector B that are not in the vector A. I want the code to ignore the capital letters, i.e. to recognise that Acceptance Boundary and acceptance boundary is the same and if the term appears in more than one way (;), e.g., (a) acceptance line ; acceptance boundary, or (b) "standard error (stdev)" and "stdev" to recognise it as the same.
I want the final result to be:
C <- data.frame(c("descriptive", "deviation", "mutability"))
In a similar question (enter link description here), Chris provided a solution, however I couldn’t adjust my code properly in order to make it work in this question’s case.
>Solution :
If A and B are vectors (not data frames as in your example), then you can use strsplit() and other helper functions like (tolower() and trimws()) to separate the values of A into separate words/concepts. Then use setdiff() to find the differences between B and your cleaned set of words/concepts:
Avals = gsub("\\)", "", trimws(tolower(unlist(strsplit(A,"( ; )|( \\()")))))
setdiff(trimws(tolower(B)),Avals)
Output:
"descriptive" "deviation" "mutability"
Input:
A = c("Absolute Value", "absolute deviation", "acceptance line ; acceptance boundary",
"age-adjusted rate", "variance", "modified mean ; modified arithmetic mean ; trimmed mean ",
"standard error (stdev)")
B = c("descriptive", "Acceptance Boundary", "deviation", "stdev",
"modified arithmetic mean", "mutability")