How to subset a grouped dataframe when a date variable is repeated

Advertisements

I am working with a dataset which includes about 20000 study participants. Each study participant had multiple study visits, and each visit is a row of the dataset.
In each row there is one column for the date, and then multiple columns for medications on that date (I have just included the first medication column below).
There was an error in the way that some of the data was collected, meaning that certain study participants have multiple rows with the same date coded. What I want to be able to do is filter my dataset, such that I get just the rows where the dates are duplicated grouped within each participant


df <- data.frame(
  studynr = c(1,1,1,1,2,2,2,3,3,3,3,3,3,4,4),
  med_dt = as.Date(c('2018-01-01', '2018-04-02', '2018-04-02', '2018-08-31',
                   '2017-06-25', '2017-08-20', '2017-08-20',
                   '2018-01-01', '2018-04-02', '2018-08-31', '2019-04-05', '2019-04-05', '2019-04-05',
                   '2018-08-14', '2018-09-30')),
  medication1 = c('aspirin', 'aspirin', 'aspirin', 'aspirin',
                 'amlodipine', 'amlodipine', 'amlodipine',
                 'bisoprolol', 'bisoprolol', 'bisoprolol', 'bisoprolol', 'bisoprolol', 'bisoprolol',
                 'aspirin', 'aspirin')
)
df
studynr     med_dt medication1
1        1 2018-01-01     aspirin
2        1 2018-04-02     aspirin
3        1 2018-04-02     aspirin
4        1 2018-08-31     aspirin
5        2 2017-06-25  amlodipine
6        2 2017-08-20  amlodipine
7        2 2017-08-20  amlodipine
8        3 2018-01-01  bisoprolol
9        3 2018-04-02  bisoprolol
10       3 2018-08-31  bisoprolol
11       3 2019-04-05  bisoprolol
12       3 2019-04-05  bisoprolol
13       3 2019-04-05  bisoprolol
14       4 2018-08-14     aspirin
15       4 2018-09-30     aspirin

what I want is a new dataframe (df2):

studynr     med_dt medication1
1        1 2018-04-02     aspirin
2        1 2018-04-02     aspirin
3        2 2017-08-20  amlodipine
4        2 2017-08-20  amlodipine
5        3 2019-04-05  bisoprolol
6        3 2019-04-05  bisoprolol
7        3 2019-04-05  bisoprolol

I have tried various iteration of:

df2 <- df %>% group_by(studynr) %>% filter(duplicated(med_dt))

But this gets me:

studynr med_dt     medication1
\<dbl\> \<date\>     \<chr\>  
1       1 2018-04-02 aspirin  
2       2 2017-08-20 amlodipine
3       3 2019-04-05 bisoprolol
4       3 2019-04-05 bisoprolol

>Solution :

Group by participant and date and keep groups with more than 1 row:

df %>%
 group_by(studynr, med_dt) %>%
 filter(n() > 1)

Leave a ReplyCancel reply