I’m trying to get a count of the number of students of each gender by class, but I also want the number of students identifying as each gender overall. The desired output is one object that has the overall and by class gender breakdowns.
I have working code (below) that does this, but I wasn’t sure if there was a more streamlined way to accomplish this task without creating an intermediary object and joining them together.
library(dplyr)
#Sample dataset
test_data <- tibble(id = c(1, 1, 2, 2, 2, 3, 3, 3),
class = c("h", "h", "m", "h", "s", "m", "h", "h"),
gender = c("m", "m", "f", "f", "f", "m", "m", "m"))
#My code to accomplish this task now (produces desired output but curious if there's a more efficient method)
gender_by_class <- test_data %>%
distinct(id, class, gender) %>%
group_by(class) %>%
count(gender) %>%
ungroup()
gender_overall <- test_data %>%
distinct(id, gender) %>%
count(gender) %>%
mutate(class = "overall") %>%
full_join(gender_by_class)
>Solution :
You could use bind_rows to have it in one pipe like this:
library(dplyr)
test_data %>%
distinct(id, class, gender) %>%
group_by(class) %>%
count(gender) %>%
ungroup() %>%
bind_rows(., test_data %>%
distinct(id, gender) %>%
count(gender) %>%
mutate(class = "overall"))
#> # A tibble: 7 × 3
#> class gender n
#> <chr> <chr> <int>
#> 1 h f 1
#> 2 h m 2
#> 3 m f 1
#> 4 m m 1
#> 5 s f 1
#> 6 overall f 1
#> 7 overall m 2
Created on 2023-01-29 with reprex v2.0.2
Thanks to @stefan, an even better option:
library(dplyr)
test_data %>%
distinct(id, class, gender) %>%
count(class, gender) %>%
bind_rows(., test_data %>%
distinct(id, gender) %>%
count(class = "overall", gender))
#> # A tibble: 7 × 3
#> class gender n
#> <chr> <chr> <int>
#> 1 h f 1
#> 2 h m 2
#> 3 m f 1
#> 4 m m 1
#> 5 s f 1
#> 6 overall f 1
#> 7 overall m 2
Created on 2023-01-29 with reprex v2.0.2