Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filter data frame in R based on column string match and another column condition

i have a data frame called df :




warning    = c("HAS","NO","HAS","HAS","HAS")
validation = c("OK","OK","WARNING","WARNING","WARNING") 
a_b_c_messages1 = c(NA,NA,"good catch",NA,NA)
D_E_f_messages2 = c(NA,NA,NA,"NOT BAD",NA)
g_h_I_messages3 = c(NA,NA,NA,NA,"BETTER")
j_k_l_messages4 = c(NA,NA,NA,NA,NA)

df = tibble(warning,validation,a_b_c_messages1,
            D_E_f_messages2,g_h_I_messages3,
            j_k_l_messages4)
 A tibble: 5 × 6
  warning validation a_b_c_messages1 D_E_f_messages2 g_h_I_messages3 j_k_l_messages4
  <chr>   <chr>      <chr>           <chr>           <chr>           <lgl>          
1 HAS     OK         NA              NA              NA              NA             
2 NO      OK         NA              NA              NA              NA             
3 HAS     WARNING    good catch      NA              NA              NA             
4 HAS     WARNING    NA              NOT BAD         NA              NA             
5 HAS     WARNING    NA              NA              BETTER          NA  

i want to filter this df and keep the rows that the column warning contain the word "HAS" and if any column that contain the string "Message" and is not NA.

ideally i wan the resulted df to be like :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  warning validation a_b_c_messages1 D_E_f_messages2 g_h_I_messages3 
  <chr>   <chr>      <chr>           <chr>           <chr>                       
3 HAS     WARNING    good catch      NA              NA                          
4 HAS     WARNING    NA              NOT BAD         NA                          
5 HAS     WARNING    NA              NA              BETTER                             

>Solution :

You can use filter with if_any and then select the columns that don’t contain all NA using where (which you didn’t specify in your question but is evident from your desired output).

library(dplyr)

df %>%
  filter(warning=="HAS" & if_any(contains("message"), ~!is.na(.))) %>%
  select(where(function(x) !all(is.na(x))))

Gives

# A tibble: 3 × 5
  warning validation a_b_c_messages1 D_E_f_messages2 g_h_I_messages3
  <chr>   <chr>      <chr>           <chr>           <chr>          
1 HAS     WARNING    good catch      NA              NA             
2 HAS     WARNING    NA              NOT BAD         NA             
3 HAS     WARNING    NA              NA              BETTER 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading