Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

calculating counts and percentages of a variable

I am trying to create a count of the number of observations and extract percentages from those counts.

I have data that looks like this:

UserID              substance_use                     
43124                       0 
43124                       1
43124                       0 
43124                       0
43124                       1
215                         1
215                         1

I want a count of the total observations of substance use like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

UserID             substance_use                                 count  
43124                       0                                        5
43124                       1                                        5
43124                       0                                        5
43124                       0                                        5
43124                       1                                        5
215                         1                                        2
215                         1                                        2

And then, I want the % of visits that participants said yes and no, like this:

UserID      substance_use      count      percent_yes       percent_no
43124        0                  5            40%               60%
43124        1                  5            40%               60%
43124        0                  5            40%               60%
43124        0                  5            40%               60%         
43124        1                  5            40%              0%  
215          1                  2            100%              0
215          1                  2            100%              0%

I tried using the count function for the first part, but it’s not taking into account the 0’s. Any help would be greatly appreciated.

>Solution :

We may use add_count to create the ‘count’ column and then get the mean of the binary column by group to return the percent_yes and subtract 100 from the percent_yes to return percent_no

library(dplyr)
df1 %>%
  add_count(UserID, name = 'count') %>%
  group_by(UserID) %>%
  mutate(percent_yes = 100 * mean(substance_use), 
       percent_no = 100 - percent_yes) %>% 
  ungroup

-output

# A tibble: 7 × 5
  UserID substance_use count percent_yes percent_no
   <int>         <int> <int>       <dbl>      <dbl>
1  43124             0     5          40         60
2  43124             1     5          40         60
3  43124             0     5          40         60
4  43124             0     5          40         60
5  43124             1     5          40         60
6    215             1     2         100          0
7    215             1     2         100          0

NOTE: Here, we assumed no missing values in ‘substance_use’ column

data

df1 <- structure(list(UserID = c(43124L, 43124L, 43124L, 43124L, 43124L, 
215L, 215L), substance_use = c(0L, 1L, 0L, 0L, 1L, 1L, 1L)), 
class = "data.frame", row.names = c(NA, 
-7L))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading