Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

r filter based on two columns and not missing

I have a dataset with multiple rows per id for some ids not all like this

    Id    Date       Col2   Col3    Col4
    1     2011-09-08 Red    NA      NA
    2     2010-07-13 Blue   NA      NA
    3     1997-04-23 Yellow NA      NA
    3     1997-04-23 Black  NA      NA
    3     1997-04-23 Green  NA      Green
    4     2020-01-27 Red    NA      NA
    4     2020-01-27 Blue   NA      NA 
    5     1999-08-03 Red    NA      NA
    5     1999-08-03 Blue   NA      Yellow

For ID with multiple rows I like to retain only rows where values in Col2 is equal to values in Col4. If the ID has multiple rows and the values in Col2 is NOT equal to values in Col4 then filter is not applicable. Expecting a final dataset like this below

   Id    Date       Col2   Col3    Col4
    1     2011-09-08 Red    NA      NA
    2     2010-07-13 Blue   NA      NA
    
    3     1997-04-23 Green  NA      Green
    
    4     2020-01-27 Red    NA      NA
    4     2020-01-27 Blue   NA      NA 
    5     1999-08-03 Red    NA      NA
    5     1999-08-03 Blue   NA      Yellow

What I did so far is this

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    df_final <- df1 %>% 
                group_by(ID)
                filter(Col2 == Col4, negate(is.na))

but this is not working , any suggestions regarding this is much appreciated.

>Solution :

Since dplyr::filter joins multiple statements using &, that’s not what we want, and not sure what you’re trying to do with negate(is.na) anyway. However, we can do a comparison to see if Col2 == Col4 and if not, check for the entire group that in no rows that occurs. When doing the comparison, I’ve just replaced NA values in Col4 with "" so the comparison returns T and F, but you can also add in a more complex check for if either are NA.

I also created the column to filter on separately to avoid comparing between the columns twice in a single filter call.

df1 %>% 
  mutate(check = Col2 == replace_na(Col4, "")) %>%
  group_by(Id) %>%
  filter(check | !any(check))
#> # A tibble: 7 Ă— 6
#> # Groups:   Id [5]
#>      Id Date       Col2  Col3  Col4   check
#>   <dbl> <date>     <chr> <lgl> <chr>  <lgl>
#> 1     1 2011-09-08 Red   NA    <NA>   FALSE
#> 2     2 2010-07-13 Blue  NA    <NA>   FALSE
#> 3     3 1997-04-23 Green NA    Green  TRUE 
#> 4     4 2020-01-27 Red   NA    <NA>   FALSE
#> 5     4 2020-01-27 Blue  NA    <NA>   FALSE
#> 6     5 1999-08-03 Red   NA    <NA>   FALSE
#> 7     5 1999-08-03 Blue  NA    Yellow FALSE

Data

structure(list(Id = c(1, 2, 3, 3, 3, 4, 4, 5, 5), Date = structure(c(15225, 
14803, 9974, 9974, 9974, 18288, 18288, 10806, 10806), class = "Date"), 
    Col2 = c("Red", "Blue", "Yellow", "Black", "Green", "Red", 
    "Blue", "Red", "Blue"), Col3 = c(NA, NA, NA, NA, NA, NA, 
    NA, NA, NA), Col4 = c(NA, NA, NA, NA, "Green", NA, NA, NA, 
    "Yellow")), problems = structure(list(row = 7L, col = NA_character_, 
    expected = "5 columns", actual = "6 columns", file = "literal data"), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -9L), spec = structure(list(
    cols = list(Id = structure(list(), class = c("collector_double", 
    "collector")), Date = structure(list(format = ""), class = c("collector_date", 
    "collector")), Col2 = structure(list(), class = c("collector_character", 
    "collector")), Col3 = structure(list(), class = c("collector_logical", 
    "collector")), Col4 = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1L), class = "col_spec"))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading