Rowwise partial match in all columns of a tibble

Advertisements

Suppose the following tibble

tibble(
  examform1 = c("Bla bla bla pass/fail", "Bla bla bla 7 point scale", "Bla bla pass fail"),
  examform2 = c("passfail bla", "7pointscale bla", "Bla bla")
)

# A tibble: 3 × 2
  examform1                 examform2      
  <chr>                     <chr>          
1 Bla bla bla pass/fail     passfail bla   
2 Bla bla bla 7 point scale 7pointscale bla
3 Bla bla pass fail         Bla bla     

I want to count the occurence of the strings in the following two vectors – and specifically, I want to end up with 2 columns, one that counts the number of occurences in any string from the vector pass and another one likewise for the vector scale

pass <- c("pass/fail", "pass fail", "passfail")
scale <- c("7 point scale", "7pointscale")

I have a very large dataframe and wish to to carry out the operation across all variables, as I am not sure which variables are important in terms of where the information I need is stored. It should look like this:

# A tibble: 3 × 4
  examform1                 examform2       occurence_pass pass_scale
  <chr>                     <chr>                    <dbl>      <dbl>
1 Bla bla bla pass/fail     passfail bla                 2          0
2 Bla bla bla 7 point scale 7pointscale bla              0          1
3 Bla bla pass fail         Bla bla                      1          0

I could potentially paste all the variables together and carry on from there – but I think that would be very slow, because my real strings are really long, and I am unsure how to continue after pasting.

Any help is greatly appreciated, I hope I made my question clear :-)!

>Solution :

You can apply grepl rowwise, i.e,

df$occurence_pass <- colSums(apply(df, 1, function(i)grepl(paste(pass, collapse = '|'), i)))
df$pass_scale <- colSums(apply(df, 1, function(i)grepl(paste(scale, collapse = '|'), i)))

df
# A tibble: 3 x 4
  examform1                 examform2       occurence_pass pass_scale
  <chr>                     <chr>                    <dbl>      <dbl>
1 Bla bla bla pass/fail     passfail bla                 2          0
2 Bla bla bla 7 point scale 7pointscale bla              0          2
3 Bla bla pass fail         Bla bla                      1          0

Leave a ReplyCancel reply