Home how to create a count variable that counts based on the value of a binary variable in R using dplyr

Questions

how to create a count variable that counts based on the value of a binary variable in R using dplyr

September 23, 2022

code to create df: I have the repeat data that looks like the table below.

df <- structure(list(patid = c("1", "1", "1", "1", "2", "2", "3", "3", 
                         "3", "4", "4", "4", "4"), observation_date = c("07/07/2016", 
                                                                        "07/08/2016", "07/11/2016", "07/07/2019", "07/05/2015", "02/12/2016", 
                                                                        "07/05/2015", "07/06/2015", "16/06/2015", "07/05/2015", "02/12/2016", 
                                                                        "18/12/2016", "15/01/2017"),
                      registration = c("0","0","1","1","0","1","0","0","0","0","1","1","1")), class = "data.frame", row.names = c(NA, 
                                                                                                                                          -13L))

patid	observation_date	registration
1	07/07/2016	0
1	07/08/2016	0
1	07/11/2016	1
1	07/07/2019	1
2	07/05/2015	0
2	02/12/2016	1
3	07/05/2015	0
3	07/06/2015	0
3	16/06/2015	0
4	07/05/2015	0
4	02/12/2016	1
4	18/12/2016	1
4	15/01/2017	1

I would like to summarize data in this way: I want two rows only for each patid, one row will have a count of the observation_date when value of registration=0 and one row will have a count of the observation_date when value of registration=1. This is the code I tried but it is only giving a count of all observation_date per patid

cons_sum <- df%>%
         group_by(patid) %>%
         arrange(observation_date)%>%
         tally()

The final table should look like this:

patid	registration	count
1	0	2
1	1	2
2	0	1
2	1	1
3	0	4
3	1	0
4	0	1
4	1	3

>Solution :

Use count. To make every possible value appear in the final table, you should convert your registration column to a factor:

df %>% 
  count(patid, registration = factor(registration), .drop = FALSE)

output

  patid registration n
1     1            0 2
2     1            1 2
3     2            0 1
4     2            1 1
5     3            0 3
6     3            1 0
7     4            0 1
8     4            1 3