im trying to clean up a dataset and im encountering a problem, 3 colums are related so before i can continue to do some clean up, i would like to remove the NAs ONLY when they all 3 have them
example
A. B. C.
1.- NA. NA. NA. <- to be removed
2.- NA. NA. 10. <- not to be removed
3.- NA. 29. NA <- not to be removed
4.- NA. NA. NA. < to be removed
I have tried so far with:
subset(data, data$A == NA & data$B == NA & data$C == NA)
data_new <- data[complete.cases(data$A) & (data$B) & (data$C), ]
but nothing seems to work
any help will be much apreciated.
>Solution :
The complete.cases code can be with | condition as complete.cases returns TRUE for a non-NA value and FALSE for NA. Thus, by using the OR, we are subsetting a row having at least one non-NA
data[complete.cases(data$A) | complete.cases(data$B) | complete.cases(data$C),]
Or more easily with rowSums
data[rowSums(is.na(data[, c("A", "B", "C")])) < 3,]
Or with dplyr with if_all or if_any
library(dplyr)
data %>%
filter(!if_all(c(A, B, C), is.na))