calculate count(n) and percentages with dplyr in R

Firstly, sorry for my basic question but I couldn’t figure it out one thing with my code.

My data frame is like this:

ID <- c("a","b","c","d","e")
age <- c(22,34,55,55,45)
gender <- c("female","male","female","female", "male")
df <- data.frame(ID, age, gender)
df
ID age gender
a  22 female
b  34   male
c  55 female
d  55 female
e  45   male

I simply want to count gender both as frequency and percentages

when I write my code like below, the frequencies become 100%. It does not take the sum score as whole gender distribution but per gender, I guess that’s why it gives 100% :

df %>% group_by(gender)%>%
  summarise(n = n(), freq = paste0(round(100 * n/sum(n), 0), "%"))
gender  n freq 
<chr>  <int> <chr>
female  3    100% 
male    2    100% 

I wanted to ask what I am doing wrong.

Thank you so much!

>Solution :

Try breaking them into separate steps:

df %>% group_by(gender) %>%
  summarise(n = n()) %>%
  mutate(freq = paste0(round(n / sum(n) * 100, 0), "%"))

Output:

# gender   n  freq
# <chr>  <int> <dbl>
# 1 female     3   60%
# 2 male       2   40%

Leave a Reply