In this MRE, I want to create a third group called age_group, and for all remaining values that do not meet the conditions, to be the actual age. Even if I make age a string (‘age’), the age_group variable does not replace these values with "age". What’s going on here?
new_df = df%>%
mutate(age_group = ifelse(
age %in% c(0:35), '0 to 35 years', ifelse(
age_options=='-66', 'Prefer not to answer', ifelse(
age_options =='-99', 'Missing', age))
)
)
data
df = structure(list(age = c(NA, NA, 38, 33, 35, 44, 33, 26, 51, 42
), age_options = c(-99, -99, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
>Solution :
When you test age_options == '-66' on NA input in ifelse, the output is NA, instead of FALSE, so it doesn’t send us to the following test(s). So you need to deal with NA’s in the age_options variable first, e.g.
df%>%
mutate(age_group = ifelse(
age %in% c(0:35), '0 to 35 years', ifelse(
is.na(age_options), age, ifelse(
age_options=='-66', 'Prefer not to answer', ifelse(
age_options =='-99', 'Missing', age)))))
This would be clearer using case_when, which is cleaner to read and doesn’t have the same issue with propagating NAs: case_when gives the specified output when the test is TRUE, instead of when it is TRUE or NA. One adjustment, though, is that with if_else or case_when, we need to intentionally coerce the output types to be consistent, i.e. all character. See here.
df%>%
mutate(age_group = case_when(
age %in% c(0:35) ~ '0 to 35 years',
age_options=='-66' ~ 'Prefer not to answer',
age_options=='-99' ~ 'Missing',
.default = as.character(age)))
age age_options age_group
<dbl> <dbl> <chr>
1 NA -99 Missing
2 NA -99 Missing
3 38 NA 38
4 33 NA 0 to 35 years
5 35 NA 0 to 35 years
6 44 NA 44
7 33 NA 0 to 35 years
8 26 NA 0 to 35 years
9 51 NA 51
10 42 NA 42