I am trying to create a function that will add columns that order multiple columns and allocate data into four groups with equal numbers of observations (these will constitute four risk levels). I understand that where there are multiple zeroes this may not be possible, but would like this to work otherwise.
I need to be able to do this with mutate_at as I have 200 different score columns
Sample dataset below. I have used the cut function, the cut function produces groups of unequal sizes even where there are not multiple 0 values
library(tidyverse)
dat <- data.frame(Suburb = c("Werribee", "Hoppers", "Carlton", "North", "Serrano", "Upwell","Market", "Poynton", "Stawell"),
scores_1 = c(0,0.0001, 230, 340, 340, 10, 3490, 5,21),
scores_2 = c(12, 0,0,0,0,0,0, 45,13),
scores_3 = c(4,5,6,7,8,9,10, 45, 45))
allscores <- c("scores_1", "scores_2", "scores_3")
myresult <- dat %>%
mutate_at(.vars = allscores,
.funs = list(risk_levels = ~ as.integer(cut(as.numeric(.), breaks = 4))))
I have also tried to use cut_numberfunction but have gotten the following error:
risk.data2 <- dat %>%
mutate_at(.vars = allscores,
.funs = list(risk_levels = ~ as.integer(cut_number(as.numeric(.), n = 4))))
Error in `mutate_cols()`:
! Problem with `mutate()` column `scores_2_risk_levels`.
i `scores_2_risk_levels = (structure(function (..., .x = ..1, .y = ..2, . = ..1) ...`.
x Insufficient data values to produce 4 bins.
Caused by error in `cut_number()`:
! Insufficient data values to produce 4 bins.
How can I use tidyverse to order and then group my many score values into four roughly equal size groups?
>Solution :
You can convert each column into ranks before cut().
dat %>%
mutate(across(starts_with("scores"),
~ cut(dense_rank(.x), 4, labels = FALSE),
.names = "{sub('scores', 'risk_levels', .col)}"))
# Suburb scores_1 scores_2 scores_3 risk_levels_1 risk_levels_2 risk_levels_3
# 1 Werribee 0.00e+00 12 4 1 2 1
# 2 Hoppers 1.00e-04 0 5 1 1 1
# 3 Carlton 2.30e+02 0 6 3 1 2
# 4 North 3.40e+02 0 7 4 1 2
# 5 Serrano 3.40e+02 0 8 4 1 3
# 6 Upwell 1.00e+01 0 9 2 1 3
# 7 Market 3.49e+03 0 10 4 1 4
# 8 Poynton 5.00e+00 45 45 2 4 4
# 9 Stawell 2.10e+01 13 45 3 3 4