Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

r retain rows only if non missing in specific column

My dataframe has columns and rows like this

   Id    Date    Col1    Col2    Col3    X1
   1     1/1/22  NA      1       0       
   1     1/1/22  0       0       1       6
   2     5/7/21  0       1       0       
   2     5/7/21  0       2       0      

I like to drop rows where the duplicate row (same Id, same date) where values for column X1 is missing or empty. If both the rows are missing X1 for that ID and date then dont drop. Only when one is missing and other is not missing then drop the missing row.

Expected output

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  Id    Date    Col1    Col2    Col3    X1
   1     1/1/22  0       0       1       6
   2     5/7/21  0       1       0       
   2     5/7/21  0       2       0     

I tried this

  library(tidyr)
  df %>%
  group_by(Id, Date) %>%
  drop_na(X1)

This drops all rows with NA or missing and I am just left with one row, which is not what I want. Any suggestions much apricated. Thanks.

>Solution :

We can create a condition in filter to return all the rows if there are only missing values in ‘X1’ or just remove the missing rows

library(dplyr)
df %>% 
   group_by(Id, Date) %>% 
   filter(if(all(is.na(X1))) TRUE else complete.cases(X1)) %>%
   ungroup

-output

# A tibble: 3 × 6
     Id Date    Col1  Col2  Col3    X1
  <int> <chr>  <int> <int> <int> <int>
1     1 1/1/22     0     0     1     6
2     2 5/7/21     0     1     0    NA
3     2 5/7/21     0     2     0    NA

Or without the if/else, use | with & condition

df %>% 
  group_by(Id, Date) %>%
  filter(any(complete.cases(X1)) & complete.cases(X1) | 
       all(is.na(X1))) %>% 
  ungroup

data

df <- structure(list(Id = c(1L, 1L, 2L, 2L), Date = c("1/1/22", "1/1/22", 
"5/7/21", "5/7/21"), Col1 = c(NA, 0L, 0L, 0L), Col2 = c(1L, 0L, 
1L, 2L), Col3 = c(0L, 1L, 0L, 0L), X1 = c(NA, 6L, NA, NA)), 
class = "data.frame", row.names = c(NA, 
-4L))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading