Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Function to cut multiple columns into roughly equal ordered groups in R

I am trying to create a function that will add columns that order multiple columns and allocate data into four groups with equal numbers of observations (these will constitute four risk levels). I understand that where there are multiple zeroes this may not be possible, but would like this to work otherwise.

I need to be able to do this with mutate_at as I have 200 different score columns

Sample dataset below. I have used the cut function, the cut function produces groups of unequal sizes even where there are not multiple 0 values

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

library(tidyverse)
dat <- data.frame(Suburb = c("Werribee", "Hoppers", "Carlton", "North", "Serrano", "Upwell","Market", "Poynton", "Stawell"),
                    scores_1 = c(0,0.0001, 230, 340, 340, 10, 3490, 5,21),
                    scores_2 = c(12, 0,0,0,0,0,0, 45,13),
                    scores_3 = c(4,5,6,7,8,9,10, 45, 45))

allscores <- c("scores_1", "scores_2", "scores_3")

myresult <- dat %>% 
  mutate_at(.vars = allscores, 
          .funs = list(risk_levels = ~ as.integer(cut(as.numeric(.), breaks = 4))))

I have also tried to use cut_numberfunction but have gotten the following error:

risk.data2 <- dat %>% 
  mutate_at(.vars = allscores, 
            .funs = list(risk_levels = ~ as.integer(cut_number(as.numeric(.), n = 4))))
Error in `mutate_cols()`:
! Problem with `mutate()` column `scores_2_risk_levels`.
i `scores_2_risk_levels = (structure(function (..., .x = ..1, .y = ..2, . = ..1) ...`.
x Insufficient data values to produce 4 bins.
Caused by error in `cut_number()`:
! Insufficient data values to produce 4 bins.

How can I use tidyverse to order and then group my many score values into four roughly equal size groups?

>Solution :

You can convert each column into ranks before cut().

dat %>% 
  mutate(across(starts_with("scores"),
                ~ cut(dense_rank(.x), 4, labels = FALSE),
                .names = "{sub('scores', 'risk_levels', .col)}"))

#     Suburb scores_1 scores_2 scores_3 risk_levels_1 risk_levels_2 risk_levels_3
# 1 Werribee 0.00e+00       12        4             1             2             1
# 2  Hoppers 1.00e-04        0        5             1             1             1
# 3  Carlton 2.30e+02        0        6             3             1             2
# 4    North 3.40e+02        0        7             4             1             2
# 5  Serrano 3.40e+02        0        8             4             1             3
# 6   Upwell 1.00e+01        0        9             2             1             3
# 7   Market 3.49e+03        0       10             4             1             4
# 8  Poynton 5.00e+00       45       45             2             4             4
# 9  Stawell 2.10e+01       13       45             3             3             4
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading