Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R remove NA values from 3 columns only when all 3 have NA

im trying to clean up a dataset and im encountering a problem, 3 colums are related so before i can continue to do some clean up, i would like to remove the NAs ONLY when they all 3 have them

example



      A.   B.   C.    
1.-   NA.  NA.  NA. <- to be removed
2.-   NA.  NA.  10. <- not to be removed
3.-   NA.  29.  NA  <- not to be removed
4.-   NA.  NA.  NA. <  to be removed

I have tried so far with:

subset(data, data$A == NA & data$B == NA & data$C == NA)

data_new <- data[complete.cases(data$A) & (data$B) & (data$C), ]


but nothing seems to work

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

any help will be much apreciated.

>Solution :

The complete.cases code can be with | condition as complete.cases returns TRUE for a non-NA value and FALSE for NA. Thus, by using the OR, we are subsetting a row having at least one non-NA

data[complete.cases(data$A) | complete.cases(data$B) | complete.cases(data$C),]

Or more easily with rowSums

data[rowSums(is.na(data[, c("A", "B", "C")])) < 3,]

Or with dplyr with if_all or if_any

library(dplyr)
data %>% 
  filter(!if_all(c(A, B, C), is.na))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading