In dplyr how do you filter to remove NA values from columns in a character vector?

I’d like to remove rows with NA in any one of the columns in a vector of column names.

Here’s a simplified example with just a couple of columns.

data <- structure(list(sample_id = c("2023.01.12_2", "2023.01.12_27", 
"2023.01.12_27", "2023.01.12_3", "2023.01.12_27", "2023.01.12_27", 
"2023.01.12_4", "2023.01.12_27", "2023.01.12_27", "2023.01.12_5"
), group = c("Unedited", "Rob", "Rob", "Partial_promoter", "Rob", 
"Rob", "Promoter_and_ATG", "Rob", "Rob", "ATG"), day = c(6, NA, 
NA, 6, NA, NA, 6, NA, NA, 6), x = c(114.243333333333, 115.036666666667, 
115.073333333333, 114.41, 116.11, 116.163333333333, 113.426666666667, 
116.15, 117.253333333333, 113.46)), row.names = c(NA, -10L), class = "data.frame")

cols <- c("group", "day")

I’ve tried a few ways, but can’t get it working. This one below seems to work.

data %>%
filter(across(.cols = cols, .fns = ~ !is.na(.x)))

But when I try reversing it, to select the columns that are NA (for QC purposes I want to keep them, but just separately) I get nothing:

data %>%
  filter(across(.cols = cols, .fns = ~ is.na(.x)))

Any ideas?

>Solution :

You could use drop_na and any_of based on the columns you mentioned. Here is some reproducible code:

cols <- c("group", "day")
library(tidyr)
data |>
  drop_na(any_of(cols))
#>      sample_id            group day        x
#> 1 2023.01.12_2         Unedited   6 114.2433
#> 2 2023.01.12_3 Partial_promoter   6 114.4100
#> 3 2023.01.12_4 Promoter_and_ATG   6 113.4267
#> 4 2023.01.12_5              ATG   6 113.4600

Created on 2023-01-16 with reprex v2.0.2

Leave a Reply