Below is my dummy data. I’d like keep only rows where value is greater or equal 5 in at least 3 (of 5) colums.
A <- c(2,4,6,23,8,3)
B <- c(NA,NA,34,5,6,NA)
C <- c(37,21,8, NA, 5,2)
D <- c(12, 67, 12, 4, 11,NA)
E <- c(11, 56, 66, 90, 2,23)
df <- data.frame(A,B,C,D,E)
df
A B C D E
1 2 NA 37 12 11
2 4 NA 21 67 56
3 6 34 8 12 66
4 23 5 NA 4 90
5 8 6 5 11 2
6 3 NA 2 NA 23
So I decided to use which and rowsums:
df[which(rowSums(df[, c(1:5)] >= 5) >= 3),]
what gives me:
A B C D E
3 6 34 8 12 66
5 8 6 5 11 2
However I’d like the rows with NAs to be taken into account as well giving the following result:
A B C D E
1 2 NA 37 12 11
2 4 NA 21 67 56
3 6 34 8 12 66
4 23 5 NA 4 90
5 8 6 5 11 2
In other words: row 1 has 1 NA value, but 3 values (of remaining 4) that are >= 5 so criteria are fulfilled.
How can I modify my command to achieve this?
>Solution :
Using na.rm=TRUE.
> df[which(rowSums(df[, c(1:5)] >= 5, na.rm=TRUE) >= 3),]
A B C D E
1 2 NA 37 12 11
2 4 NA 21 67 56
3 6 34 8 12 66
4 23 5 NA 4 90
5 8 6 5 11 2
You can probably do without which now, since no longer NAs are thrown.
> df[rowSums(df[, c(1:5)] >= 5, na.rm=TRUE) >= 3,]
A B C D E
1 2 NA 37 12 11
2 4 NA 21 67 56
3 6 34 8 12 66
4 23 5 NA 4 90
5 8 6 5 11 2
Data:
> dput(df)
structure(list(A = c(2, 4, 6, 23, 8, 3), B = c(NA, NA, 34, 5,
6, NA), C = c(37, 21, 8, NA, 5, 2), D = c(12, 67, 12, 4, 11,
NA), E = c(11, 56, 66, 90, 2, 23)), class = "data.frame", row.names = c(NA,
-6L))