I am trying to create a new column (age_clean) based on data in 2 columns: age (numeric) and age_unit (days, weeks, months, years).
If the age is between 0-<1 OR age unit equals weeks OR days or age is between 0-11 with unit months, have age_clean = 0.
Here is my code:
database %>%
mutate(age_clean =
case_when(!age_unit %in% c("years", "days", "months", "weeks") ~ "Other",
(age >= 0 & age < 1) | (age_unit == "weeks" | age_unit == "days") | (age >= 0 & age <= 11 & age_unit == "months") ~ '0',
TRUE ~ as.numeric(age)))
Error in `mutate()`:
! Problem while computing `age_clean = case_when(...)`.
Caused by error in `` names(message) <- `*vtmp*` ``:
! 'names' attribute [1] must be the same length as the vector [0]
Run `rlang::last_error()` to see where the error occurred.
Note that for some of the rows, there is no data in age or age_unit. Maybe I am missing a clause for NA?
>Solution :
It looks like you overcomplicate your cases though, but I am not 100% sure if I understand you well. I assume you want "age_clean" to be an age in years? If so you can do it this way.
library(dplyr)
database %>% mutate(
age_clean = case_when(
age_unit == "years" ~ age,
age_unit == "months" ~ age %/% 12,
age_unit == "weeks" ~ age %/% 52,
age_unit == "days" ~ age %/% 365
)
)
Or you can use this instead of using cases at all:
mutate(age_clean = floor(as.numeric(lubridate::duration(age, units = age_unit)) / (60 * 60 * 24 * 365)))
results
# age age_unit age_clean
# 1 2 years 2
# 2 18 years 18
# 3 100 days 0
# 4 380 days 1
# 5 10 months 0
# 6 26 months 2
# 7 25 weeks 0
# 8 54 weeks 1
data
database <- data.frame(
age = c(2, 18, 100, 380, 10, 26, 25, 54),
age_unit = c("years", "years", "days", "days", "months", "months", "weeks", "weeks")
)