code to create df: I have the repeat data that looks like the table below.
df <- structure(list(patid = c("1", "1", "1", "1", "2", "2", "3", "3",
"3", "4", "4", "4", "4"), observation_date = c("07/07/2016",
"07/08/2016", "07/11/2016", "07/07/2019", "07/05/2015", "02/12/2016",
"07/05/2015", "07/06/2015", "16/06/2015", "07/05/2015", "02/12/2016",
"18/12/2016", "15/01/2017"),
registration = c("0","0","1","1","0","1","0","0","0","0","1","1","1")), class = "data.frame", row.names = c(NA,
-13L))
| patid | observation_date | registration |
|---|---|---|
| 1 | 07/07/2016 | 0 |
| 1 | 07/08/2016 | 0 |
| 1 | 07/11/2016 | 1 |
| 1 | 07/07/2019 | 1 |
| 2 | 07/05/2015 | 0 |
| 2 | 02/12/2016 | 1 |
| 3 | 07/05/2015 | 0 |
| 3 | 07/06/2015 | 0 |
| 3 | 16/06/2015 | 0 |
| 4 | 07/05/2015 | 0 |
| 4 | 02/12/2016 | 1 |
| 4 | 18/12/2016 | 1 |
| 4 | 15/01/2017 | 1 |
I would like to summarize data in this way: I want two rows only for each patid, one row will have a count of the observation_date when value of registration=0 and one row will have a count of the observation_date when value of registration=1. This is the code I tried but it is only giving a count of all observation_date per patid
cons_sum <- df%>%
group_by(patid) %>%
arrange(observation_date)%>%
tally()
The final table should look like this:
| patid | registration | count |
|---|---|---|
| 1 | 0 | 2 |
| 1 | 1 | 2 |
| 2 | 0 | 1 |
| 2 | 1 | 1 |
| 3 | 0 | 4 |
| 3 | 1 | 0 |
| 4 | 0 | 1 |
| 4 | 1 | 3 |
>Solution :
Use count. To make every possible value appear in the final table, you should convert your registration column to a factor:
df %>%
count(patid, registration = factor(registration), .drop = FALSE)
output
patid registration n
1 1 0 2
2 1 1 2
3 2 0 1
4 2 1 1
5 3 0 3
6 3 1 0
7 4 0 1
8 4 1 3