Which and rowsums to select particular rows, however considering NAs

March 8, 2024

Below is my dummy data. I’d like keep only rows where value is greater or equal 5 in at least 3 (of 5) colums.

A <- c(2,4,6,23,8,3)
B <- c(NA,NA,34,5,6,NA)
C <- c(37,21,8, NA, 5,2)
D <- c(12, 67, 12, 4, 11,NA)
E <- c(11, 56, 66, 90, 2,23)
df <- data.frame(A,B,C,D,E)
df
   A  B  C  D  E
1  2 NA 37 12 11
2  4 NA 21 67 56
3  6 34  8 12 66
4 23  5 NA  4 90
5  8  6  5 11  2
6  3 NA  2 NA 23

So I decided to use which and rowsums:

df[which(rowSums(df[, c(1:5)] >= 5) >= 3),]

what gives me:

  A  B C  D  E
3 6 34 8 12 66
5 8  6 5 11  2

However I’d like the rows with NAs to be taken into account as well giving the following result:

   A  B  C  D  E
1  2 NA 37 12 11
2  4 NA 21 67 56
3  6 34  8 12 66
4 23  5 NA  4 90
5  8  6  5 11  2

In other words: row 1 has 1 NA value, but 3 values (of remaining 4) that are >= 5 so criteria are fulfilled.

How can I modify my command to achieve this?

>Solution :

Using na.rm=TRUE.

> df[which(rowSums(df[, c(1:5)] >= 5, na.rm=TRUE) >= 3),]
   A  B  C  D  E
1  2 NA 37 12 11
2  4 NA 21 67 56
3  6 34  8 12 66
4 23  5 NA  4 90
5  8  6  5 11  2

You can probably do without which now, since no longer NAs are thrown.

> df[rowSums(df[, c(1:5)] >= 5, na.rm=TRUE) >= 3,]
   A  B  C  D  E
1  2 NA 37 12 11
2  4 NA 21 67 56
3  6 34  8 12 66
4 23  5 NA  4 90
5  8  6  5 11  2

Data:

> dput(df)
structure(list(A = c(2, 4, 6, 23, 8, 3), B = c(NA, NA, 34, 5, 
6, NA), C = c(37, 21, 8, NA, 5, 2), D = c(12, 67, 12, 4, 11, 
NA), E = c(11, 56, 66, 90, 2, 23)), class = "data.frame", row.names = c(NA, 
-6L))