Replace range of values in a column with a factor using dplyr

Advertisements

I have a question similar to this, but instead of two factors, I would like to create four factors. Replace range of values for factor with levels

How do I do that? I don’t know how to share my own data table so I will use the iris dataset.

library(datasets)
data(iris)

Let’s say I want to categorize Sepal.Length into 4 categories 4.3-4.9,5-6,6.1-7,7.1-7.9 and label each range as A,B,C,D (factors) in a new column. Can this be done using the dplyr package?
I came across several similar questions that use the "cut" function but I was not able to use it without getting an error message.

>Solution :

You can use cut inside mutate. Pass Sepal.Length as the first argument, the vector of cut points you want to use for the breaks argument (it should be length-5), and the labels you want to assign via the labels argument.

library(tidyverse)

iris %>%
  as_tibble() %>%
  mutate(newcol = cut(Sepal.Length, breaks = c(0, 1.9, 3.9, 5.9, 8), 
                      labels = LETTERS[1:4]))
#> # A tibble: 150 x 6
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species newcol
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>   <fct> 
#>  1          5.1         3.5          1.4         0.2 setosa  C     
#>  2          4.9         3            1.4         0.2 setosa  C     
#>  3          4.7         3.2          1.3         0.2 setosa  C     
#>  4          4.6         3.1          1.5         0.2 setosa  C     
#>  5          5           3.6          1.4         0.2 setosa  C     
#>  6          5.4         3.9          1.7         0.4 setosa  C     
#>  7          4.6         3.4          1.4         0.3 setosa  C     
#>  8          5           3.4          1.5         0.2 setosa  C     
#>  9          4.4         2.9          1.4         0.2 setosa  C     
#> 10          4.9         3.1          1.5         0.1 setosa  C     
#> # ... with 140 more rows

Created on 2023-01-30 with reprex v2.0.2

Leave a ReplyCancel reply