Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to filter rows with NA based on multiple conditions

How to drop rows with NA (missing values) based on multiple conditions in group column?

Here is the dummy data:

a <- data.frame(c('b1', 'b2', 'b3', 'b4'),
                c(NA, 1.5, 0.5, 1),
                c(0.4, NA, 0.3, NA),
                c(0.5, NA, NA,2),
                c(-0.5, -2.5, -0.2,NA),
                c(NA, NA, -0.4,NA),
                c(-0.5, NA, -0.4,NA),
                stringsAsFactors = FALSE)

colnames(a) <- c('id', 'group1_1', 'group1_2', 'group1_3', 'group2_1', 'group2_2', 'group2_3')
rownames(a) <- a$id

a_subset <- a[, 2:7]
a_subset

#    group1_1 group1_2 group1_3 group2_1 group2_2 group2_3
# b1       NA      0.4      0.5     -0.5       NA     -0.5
# b2      1.5       NA       NA     -2.5       NA       NA
# b3      0.5      0.3       NA     -0.2     -0.4     -0.4
# b4      1.0       NA      2.0       NA       NA       NA

As you can see from above dataframe, group_1 and group_2 contains some missing values, and each group has triplicates.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Expected output:

#    group1_1 group1_2 group1_3 group2_1 group2_2 group2_3
# b1       NA      0.4      0.5     -0.5       NA     -0.5
# b3      0.5      0.3       NA     -0.2     -0.4     -0.4
# b4      1.0       NA      2.0       NA       NA       NA

From above you can see if 1 group contains at least 2 values it will not be removed.

Is it possible with dplyr approach?

>Solution :

tidyverse


library(tidyverse)

a_subset %>% 
  filter(
    rowSums(!is.na(across(starts_with("group1_")))) >= 2 | 
      rowSums(!is.na(across(starts_with("group2_")))) >= 2)

#>    group1_1 group1_2 group1_3 group2_1 group2_2 group2_3
#> b1       NA      0.4      0.5     -0.5       NA     -0.5
#> b3      0.5      0.3       NA     -0.2     -0.4     -0.4
#> b4      1.0       NA      2.0       NA       NA       NA

data

a_subset <- data.frame(
   row.names = c("b1", "b2", "b3", "b4"),
    group1_1 = c(NA, 1.5, 0.5, 1),
    group1_2 = c(0.4, NA, 0.3, NA),
    group1_3 = c(0.5, NA, NA, 2),
    group2_1 = c(-0.5, -2.5, -0.2, NA),
    group2_2 = c(NA, NA, -0.4, NA),
    group2_3 = c(-0.5, NA, -0.4, NA)
)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading