Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to subset a grouped dataframe when a date variable is repeated

I am working with a dataset which includes about 20000 study participants. Each study participant had multiple study visits, and each visit is a row of the dataset.
In each row there is one column for the date, and then multiple columns for medications on that date (I have just included the first medication column below).
There was an error in the way that some of the data was collected, meaning that certain study participants have multiple rows with the same date coded. What I want to be able to do is filter my dataset, such that I get just the rows where the dates are duplicated grouped within each participant


df <- data.frame(
  studynr = c(1,1,1,1,2,2,2,3,3,3,3,3,3,4,4),
  med_dt = as.Date(c('2018-01-01', '2018-04-02', '2018-04-02', '2018-08-31',
                   '2017-06-25', '2017-08-20', '2017-08-20',
                   '2018-01-01', '2018-04-02', '2018-08-31', '2019-04-05', '2019-04-05', '2019-04-05',
                   '2018-08-14', '2018-09-30')),
  medication1 = c('aspirin', 'aspirin', 'aspirin', 'aspirin',
                 'amlodipine', 'amlodipine', 'amlodipine',
                 'bisoprolol', 'bisoprolol', 'bisoprolol', 'bisoprolol', 'bisoprolol', 'bisoprolol',
                 'aspirin', 'aspirin')
)
df
studynr     med_dt medication1
1        1 2018-01-01     aspirin
2        1 2018-04-02     aspirin
3        1 2018-04-02     aspirin
4        1 2018-08-31     aspirin
5        2 2017-06-25  amlodipine
6        2 2017-08-20  amlodipine
7        2 2017-08-20  amlodipine
8        3 2018-01-01  bisoprolol
9        3 2018-04-02  bisoprolol
10       3 2018-08-31  bisoprolol
11       3 2019-04-05  bisoprolol
12       3 2019-04-05  bisoprolol
13       3 2019-04-05  bisoprolol
14       4 2018-08-14     aspirin
15       4 2018-09-30     aspirin

what I want is a new dataframe (df2):

studynr     med_dt medication1
1        1 2018-04-02     aspirin
2        1 2018-04-02     aspirin
3        2 2017-08-20  amlodipine
4        2 2017-08-20  amlodipine
5        3 2019-04-05  bisoprolol
6        3 2019-04-05  bisoprolol
7        3 2019-04-05  bisoprolol

I have tried various iteration of:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df2 <- df %>% group_by(studynr) %>% filter(duplicated(med_dt))

But this gets me:

studynr med_dt     medication1
\<dbl\> \<date\>     \<chr\>  
1       1 2018-04-02 aspirin  
2       2 2017-08-20 amlodipine
3       3 2019-04-05 bisoprolol
4       3 2019-04-05 bisoprolol

>Solution :

Group by participant and date and keep groups with more than 1 row:

df %>%
 group_by(studynr, med_dt) %>%
 filter(n() > 1)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading