Function to cut multiple columns into roughly equal ordered groups in R

August 1, 2023

I am trying to create a function that will add columns that order multiple columns and allocate data into four groups with equal numbers of observations (these will constitute four risk levels). I understand that where there are multiple zeroes this may not be possible, but would like this to work otherwise.

I need to be able to do this with mutate_at as I have 200 different score columns

Sample dataset below. I have used the cut function, the cut function produces groups of unequal sizes even where there are not multiple 0 values

library(tidyverse)
dat <- data.frame(Suburb = c("Werribee", "Hoppers", "Carlton", "North", "Serrano", "Upwell","Market", "Poynton", "Stawell"),
                    scores_1 = c(0,0.0001, 230, 340, 340, 10, 3490, 5,21),
                    scores_2 = c(12, 0,0,0,0,0,0, 45,13),
                    scores_3 = c(4,5,6,7,8,9,10, 45, 45))

allscores <- c("scores_1", "scores_2", "scores_3")

myresult <- dat %>% 
  mutate_at(.vars = allscores, 
          .funs = list(risk_levels = ~ as.integer(cut(as.numeric(.), breaks = 4))))

I have also tried to use cut_numberfunction but have gotten the following error:

risk.data2 <- dat %>% 
  mutate_at(.vars = allscores, 
            .funs = list(risk_levels = ~ as.integer(cut_number(as.numeric(.), n = 4))))

Error in `mutate_cols()`:
! Problem with `mutate()` column `scores_2_risk_levels`.
i `scores_2_risk_levels = (structure(function (..., .x = ..1, .y = ..2, . = ..1) ...`.
x Insufficient data values to produce 4 bins.
Caused by error in `cut_number()`:
! Insufficient data values to produce 4 bins.

How can I use tidyverse to order and then group my many score values into four roughly equal size groups?

>Solution :

You can convert each column into ranks before cut().

dat %>% 
  mutate(across(starts_with("scores"),
                ~ cut(dense_rank(.x), 4, labels = FALSE),
                .names = "{sub('scores', 'risk_levels', .col)}"))

#     Suburb scores_1 scores_2 scores_3 risk_levels_1 risk_levels_2 risk_levels_3
# 1 Werribee 0.00e+00       12        4             1             2             1
# 2  Hoppers 1.00e-04        0        5             1             1             1
# 3  Carlton 2.30e+02        0        6             3             1             2
# 4    North 3.40e+02        0        7             4             1             2
# 5  Serrano 3.40e+02        0        8             4             1             3
# 6   Upwell 1.00e+01        0        9             2             1             3
# 7   Market 3.49e+03        0       10             4             1             4
# 8  Poynton 5.00e+00       45       45             2             4             4
# 9  Stawell 2.10e+01       13       45             3             3             4