Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

dplyr filter based on conditions across and within column

I’d like to validate survey responses, involving removing rows with NAs based on the condition within a column and across columns. Sample dataset below

col1 <- c("Yes", "Yes", "No", "No", NA)
col2 <- c("Yes", NA, "No", NA, NA)
col3 <- c("No", "Yes", "No", NA, NA)

dataset <- data.frame(col1, col2, col3)
dataset

The desired output involves filtering out all rows in col1, and then removing only the row with a Yes in col1 and NA in any other column. Desired output below
`

  col1 col2 col3
1  Yes  Yes   No
2   No   No   No
3   No <NA> <NA>

`
I’ve tried basic filtering operations like

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

dataset %>% filter(col1 == "Yes" | !is.na(.)) 

with other operators such as ‘& , |’ but with no luck and I’m not sure how to apply across or filter_if here to make it work. I recognize this is very similar to https://stackoverflow.com/questions/43938863/dplyr-filter-with-condition-on-multiple-columns, but different enough to warrant asking this question again.

What am I missing here?

>Solution :

Your logic is encapsulated with:

dataset %>%
  filter(!(is.na(col1) | (col1 == "Yes" & (is.na(col2) | is.na(col3)))))
#>   col1 col2 col3
#> 1  Yes  Yes   No
#> 2   No   No   No
#> 3   No <NA> <NA>

We can rewrite this with indentations and comments to make the logic clearer:

dataset %>%
  filter(!(                       # Remove any of the following cases:
      is.na(col1)                       # Column 1 is missing
      |                               # OR 
      (col1 == "Yes"                    # col1 is yes               
       &                                # AND
      (is.na(col2) | is.na(col3))       # Either col2 OR col3 are missing
      )
 ))
#>   col1 col2 col3
#> 1  Yes  Yes   No
#> 2   No   No   No
#> 3   No <NA> <NA>
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading