Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R dplyr get subset with certain condition

I got the following dataset

ID = c('A','A','A','A','A','A','B','B','B','B')
B = c(1,2,1,3,2,2,1,2,3,4)
date = as.Date(c('1990-01-01','1990-01-01','1990-01-02','1990-01-02',
                 '1990-01-03','1990-01-04','1990-03-01','1990-03-01',
                 '1990-03-01','1990-03-01'))

dataset1 <- data.frame(ID,B,date)

   ID B       date
1   A 1 1990-01-01
2   A 2 1990-01-01
3   A 1 1990-01-02
4   A 3 1990-01-02
5   A 2 1990-01-03
6   A 2 1990-01-04
7   B 1 1990-03-01
8   B 2 1990-03-01
9   B 3 1990-03-01
10  B 4 1990-03-01

Desired output:

   ID B       date
2   A 2 1990-01-01
4   A 3 1990-01-02
8   B 2 1990-03-01
9   B 3 1990-03-01
10  B 4 1990-03-01

or

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

   ID B       date
1   A 1 1990-01-01
2   A 2 1990-01-01
3   A 1 1990-01-02
4   A 3 1990-01-02
7   B 1 1990-03-01
8   B 2 1990-03-01
9   B 3 1990-03-01
10  B 4 1990-03-01

I can simply drop rows with B=1 in the second case.

I want to get the other values of column B in which the date is equal to the date at B=1. For example, ID ‘A’ with B=1 has dates 1990-01-01 and 1990-01-02. I want to get the rows based on date[B=1].

My attempt :

dataset1 %>% group_by(ID) %>% arrange(date) %>% 
  filter(date %in% date[B=1])

I got:

ID        B date      
  <chr> <dbl> <date>    
1 A         1 1990-01-01
2 A         2 1990-01-01
3 B         1 1990-03-01
4 B         2 1990-03-01
5 B         3 1990-03-01
6 B         4 1990-03-01

These rows are missing:

  A         1 1990-01-02
  A         3 1990-01-02

I also tried add any or all inside filter but didn’t have what I wanted.
Appreciate any helps!

>Solution :

We can use a group by approach with ‘ID’, ‘date’ as groups and then filter where ‘B’ is not 1 and where there are only groups having ‘B’ value of 1

library(dplyr)
dataset1 %>% 
   group_by(ID, date) %>%
   filter(B != 1 & any(B == 1)) %>%
   ungroup

-output

# A tibble: 5 × 3
  ID        B date      
  <chr> <dbl> <date>    
1 A         2 1990-01-01
2 A         3 1990-01-02
3 B         2 1990-03-01
4 B         3 1990-03-01
5 B         4 1990-03-01
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading