The code I’ve got so far works fine, but I want to include in the output certain Majors with zero count. Having read around it looks like the solution is to include .drop = FALSE within count(), but I can’t get it to work.
#DATA
Sum22_Graduation_MajorsALL <- data.frame(Sum_2022_Graduation.Major_1 =c('CRJS', 'CRJS',
'ENGL', 'ENGL', 'JOE', 'DAN', 'HIST', 'PPE'), Sum_2022_Graduation.Major_2 =c('JOE', 'DAN', 'ENGL',
'HIST', 'PPE', 'CRJS', 'CRJS', 'PPE'))
#CODE SO FAR
Sum22_selectCOHSSgrad <- Sum22_Graduation_MajorsALL %>%
select(Sum_2022_Graduation.Major_1, Sum_2022_Graduation.Major_2) %>%
pivot_longer(cols = everything(), names_to = NULL, values_to = 'Majors') %>%
filter(Majors=='CRJS' | Majors=='ENGL' | Majors=='HIST' | Majors=='POLS' | Majors=='PPE') %>%
count(Majors, name = "Count")
But because POLS does not occur in Sum22_Graduation_MajorsALL, the output just doesn’t include POLS at all–whereas I would like it to include 'POLS'.......0. The documentation for dplyr seems to say that count(Majors, name = "Count", .drop = FALSE) should accomplish. I’m obviously using this incorrectly, but can someone kindly point out where is my error?
Thanks and happy thanksgiving!
>Solution :
The docs for dplyr::count() note that arguments can be passed through the ellipsis to dplyr::group_by() which has the .drop parameter:
.dropDrop groups formed by factor levels that don’t appear in the data? The default isTRUE
In your case, make Majors a factor, specifying the levels explicitly. Then set .drop = FALSE in your count() call.
subjects_of_interest <- c("CRJS", "ENGL", "HIST", "POLS", "PPE")
Sum22_Graduation_MajorsALL |>
select(Sum_2022_Graduation.Major_1, Sum_2022_Graduation.Major_2) |>
pivot_longer(cols = everything(), names_to = NULL, values_to = "Majors") |>
filter(Majors %in% subjects_of_interest) |>
mutate(Majors = factor(Majors, levels = subjects_of_interest)) |>
count(Majors, name = "Count", .drop = FALSE)
# # A tibble: 5 × 2
# Majors Count
# <fct> <int>
# 1 CRJS 4
# 2 ENGL 3
# 3 HIST 2
# 4 POLS 0
# 5 PPE 3