Home How to subset a grouped dataframe when a date variable is repeated

Questions

How to subset a grouped dataframe when a date variable is repeated

March 13, 2023

I am working with a dataset which includes about 20000 study participants. Each study participant had multiple study visits, and each visit is a row of the dataset.
In each row there is one column for the date, and then multiple columns for medications on that date (I have just included the first medication column below).
There was an error in the way that some of the data was collected, meaning that certain study participants have multiple rows with the same date coded. What I want to be able to do is filter my dataset, such that I get just the rows where the dates are duplicated grouped within each participant


df <- data.frame(
  studynr = c(1,1,1,1,2,2,2,3,3,3,3,3,3,4,4),
  med_dt = as.Date(c('2018-01-01', '2018-04-02', '2018-04-02', '2018-08-31',
                   '2017-06-25', '2017-08-20', '2017-08-20',
                   '2018-01-01', '2018-04-02', '2018-08-31', '2019-04-05', '2019-04-05', '2019-04-05',
                   '2018-08-14', '2018-09-30')),
  medication1 = c('aspirin', 'aspirin', 'aspirin', 'aspirin',
                 'amlodipine', 'amlodipine', 'amlodipine',
                 'bisoprolol', 'bisoprolol', 'bisoprolol', 'bisoprolol', 'bisoprolol', 'bisoprolol',
                 'aspirin', 'aspirin')
)
df

studynr     med_dt medication1
1        1 2018-01-01     aspirin
2        1 2018-04-02     aspirin
3        1 2018-04-02     aspirin
4        1 2018-08-31     aspirin
5        2 2017-06-25  amlodipine
6        2 2017-08-20  amlodipine
7        2 2017-08-20  amlodipine
8        3 2018-01-01  bisoprolol
9        3 2018-04-02  bisoprolol
10       3 2018-08-31  bisoprolol
11       3 2019-04-05  bisoprolol
12       3 2019-04-05  bisoprolol
13       3 2019-04-05  bisoprolol
14       4 2018-08-14     aspirin
15       4 2018-09-30     aspirin

what I want is a new dataframe (df2):

studynr     med_dt medication1
1        1 2018-04-02     aspirin
2        1 2018-04-02     aspirin
3        2 2017-08-20  amlodipine
4        2 2017-08-20  amlodipine
5        3 2019-04-05  bisoprolol
6        3 2019-04-05  bisoprolol
7        3 2019-04-05  bisoprolol

I have tried various iteration of:

df2 <- df %>% group_by(studynr) %>% filter(duplicated(med_dt))

But this gets me:

studynr med_dt     medication1
\<dbl\> \<date\>     \<chr\>  
1       1 2018-04-02 aspirin  
2       2 2017-08-20 amlodipine
3       3 2019-04-05 bisoprolol
4       3 2019-04-05 bisoprolol

>Solution :

Group by participant and date and keep groups with more than 1 row:

df %>%
 group_by(studynr, med_dt) %>%
 filter(n() > 1)

duplicates

byMR

Published March 13, 2023

Add a comment

Trigger An Event Listener

byMR

March 13, 2023

Questions

Data Cleaning Badly Designed JSON Data — Need advice on best strategy

byMR

March 13, 2023

Questions

Ignite messaging: Is there a new style for async messaging?

byMR

March 13, 2023

Questions

CMake not finding header files

byMR

March 13, 2023

Questions

How to set a state variable to an empty array in React Typescript

byMR

March 13, 2023

Questions

Error looping the ANOVA test in R in dataframe

byMR

March 13, 2023

How to subset a grouped dataframe when a date variable is repeated

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Trigger An Event Listener

Data Cleaning Badly Designed JSON Data — Need advice on best strategy

Ignite messaging: Is there a new style for async messaging?

CMake not finding header files

How to set a state variable to an empty array in React Typescript

Error looping the ANOVA test in R in dataframe

Keep Up to Date with the Most Important News

How to subset a grouped dataframe when a date variable is repeated

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Trigger An Event Listener

Data Cleaning Badly Designed JSON Data — Need advice on best strategy

Ignite messaging: Is there a new style for async messaging?

CMake not finding header files

How to set a state variable to an empty array in React Typescript

Error looping the ANOVA test in R in dataframe

Discover more from Dev solutions