Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

how to create a count variable that counts based on the value of a binary variable in R using dplyr

code to create df: I have the repeat data that looks like the table below.

df <- structure(list(patid = c("1", "1", "1", "1", "2", "2", "3", "3", 
                         "3", "4", "4", "4", "4"), observation_date = c("07/07/2016", 
                                                                        "07/08/2016", "07/11/2016", "07/07/2019", "07/05/2015", "02/12/2016", 
                                                                        "07/05/2015", "07/06/2015", "16/06/2015", "07/05/2015", "02/12/2016", 
                                                                        "18/12/2016", "15/01/2017"),
                      registration = c("0","0","1","1","0","1","0","0","0","0","1","1","1")), class = "data.frame", row.names = c(NA, 
                                                                                                                                          -13L))
patid observation_date registration
1 07/07/2016 0
1 07/08/2016 0
1 07/11/2016 1
1 07/07/2019 1
2 07/05/2015 0
2 02/12/2016 1
3 07/05/2015 0
3 07/06/2015 0
3 16/06/2015 0
4 07/05/2015 0
4 02/12/2016 1
4 18/12/2016 1
4 15/01/2017 1

I would like to summarize data in this way: I want two rows only for each patid, one row will have a count of the observation_date when value of registration=0 and one row will have a count of the observation_date when value of registration=1. This is the code I tried but it is only giving a count of all observation_date per patid

cons_sum <- df%>%
         group_by(patid) %>%
         arrange(observation_date)%>%
         tally()

The final table should look like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

patid registration count
1 0 2
1 1 2
2 0 1
2 1 1
3 0 4
3 1 0
4 0 1
4 1 3

>Solution :

Use count. To make every possible value appear in the final table, you should convert your registration column to a factor:

df %>% 
  count(patid, registration = factor(registration), .drop = FALSE)

output

  patid registration n
1     1            0 2
2     1            1 2
3     2            0 1
4     2            1 1
5     3            0 3
6     3            1 0
7     4            0 1
8     4            1 3
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading