Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R group_by and filter with multiple conditions

I have a dataset like the following:

ID <- c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 5)
LABEL <- c('TT', 'CH', 'AB', 'TT', 'CH', 'AB', 'TT', 'CH', 'TT', 'CH', 'AB')
VALUE <- c(2121, 32, 1, 2121, 32, 2, 2121, 99, 2222, 32, 9)

DATA <- data.frame(ID, LABEL, VALUE)

I’m trying to group the data by ID (it’s a long file where ID replicates) then filter where the same ID meets multiple conditions. I’ve tried the following:

X <- DATA %>% 
  group_by(ID) %>% 
  filter( 
    (LABEL == "TT" & VALUE == "2121") & 
      (LABEL == "CH" & VALUE == "32") )

I want to get a dataset with only the observations with ID 1 and 2 which meet both conditions simultaneously.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Any help is appreciated.

>Solution :

We can use any – after grouping by ‘ID’, if we want to check for presence of the strings in two columns, wrap with any as multiple expression cannot be TRUE for the same location. Then, we use & so that if only both are present it returns TRUE (any – returns a single TRUE/FALSE)

library(dplyr)
DATA %>% 
   group_by(ID) %>% 
   filter( 
    any((LABEL == "TT" & VALUE == "2121") &
      any((LABEL == "CH" & VALUE == "32") ))) %>%
   ungroup

-output

# A tibble: 6 × 3
     ID LABEL VALUE
  <dbl> <chr> <dbl>
1     1 TT     2121
2     1 CH       32
3     1 AB        1
4     2 TT     2121
5     2 CH       32
6     2 AB        2

Or slighly more compact option by pasteing the columns and then check if all the elements in the lhs of %in% are TRUE for filtering the groups

library(stringr)
DATA %>%
   group_by(ID) %>%
   filter(all(c("TT2121", "CH32") %in% str_c(LABEL, VALUE))) %>% 
   ungroup

-output

# A tibble: 6 × 3
     ID LABEL VALUE
  <dbl> <chr> <dbl>
1     1 TT     2121
2     1 CH       32
3     1 AB        1
4     2 TT     2121
5     2 CH       32
6     2 AB        2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading