Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Rowwise partial match in all columns of a tibble

Suppose the following tibble

tibble(
  examform1 = c("Bla bla bla pass/fail", "Bla bla bla 7 point scale", "Bla bla pass fail"),
  examform2 = c("passfail bla", "7pointscale bla", "Bla bla")
)

# A tibble: 3 Ă— 2
  examform1                 examform2      
  <chr>                     <chr>          
1 Bla bla bla pass/fail     passfail bla   
2 Bla bla bla 7 point scale 7pointscale bla
3 Bla bla pass fail         Bla bla     

I want to count the occurence of the strings in the following two vectors – and specifically, I want to end up with 2 columns, one that counts the number of occurences in any string from the vector pass and another one likewise for the vector scale

pass <- c("pass/fail", "pass fail", "passfail")
scale <- c("7 point scale", "7pointscale")

I have a very large dataframe and wish to to carry out the operation across all variables, as I am not sure which variables are important in terms of where the information I need is stored. It should look like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

# A tibble: 3 Ă— 4
  examform1                 examform2       occurence_pass pass_scale
  <chr>                     <chr>                    <dbl>      <dbl>
1 Bla bla bla pass/fail     passfail bla                 2          0
2 Bla bla bla 7 point scale 7pointscale bla              0          1
3 Bla bla pass fail         Bla bla                      1          0

I could potentially paste all the variables together and carry on from there – but I think that would be very slow, because my real strings are really long, and I am unsure how to continue after pasting.

Any help is greatly appreciated, I hope I made my question clear :-)!

>Solution :

You can apply grepl rowwise, i.e,

df$occurence_pass <- colSums(apply(df, 1, function(i)grepl(paste(pass, collapse = '|'), i)))
df$pass_scale <- colSums(apply(df, 1, function(i)grepl(paste(scale, collapse = '|'), i)))

df
# A tibble: 3 x 4
  examform1                 examform2       occurence_pass pass_scale
  <chr>                     <chr>                    <dbl>      <dbl>
1 Bla bla bla pass/fail     passfail bla                 2          0
2 Bla bla bla 7 point scale 7pointscale bla              0          2
3 Bla bla pass fail         Bla bla                      1          0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading