Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Convert factor to numeric in the same order of the factor from 0 to length of the unique values

I am able to convert the new_target column into numerical form. But as the factor form is already numerical, I am left with a bunch of numbers. I want them ordered and reassigned to their equivalent from 0 to the length of the factor. I have a numerical target at first, then I quantize it to 20 bins. As a result, I obtain new_target column which consists of the unique values (0,1,3,14,16,18,19). Instead of these unique values I need values ordered from 0 to length of the unique values in new_target. Which are c(0,1,2,3,4,5,6). The expected output is given in new_target_expected column. How can I create new_target_expected column without manually creating it? I have a bigger dataframe I am dealing with and it is not possible to do this manually.

require(stringr)
require(data.table)

cat_var <- c("rock", "indie", "rock", "rock", "pop", "indie", "pop", "rock", "pop")
cat_var_2 <- c("blue", "green", "red", "red", "blue", "red", "green", "blue", "green")
target_var <- c(30, 10, 27, 14, 29, 25, 27, 12, 10)
df <- data.table("categorical_variable" = cat_var, "categorical_variable_2" = cat_var_2, "target_variable" =  target_var)

targetVariable <- "target_variable"

number_of_buckets = 20
# Each bucket should contain equal number of objects
a <- cut(df[[targetVariable]] , breaks = number_of_buckets, labels = 0:(number_of_buckets - 1)) 

df[["new_target"]] <- a
df[["new_target"]] <- as.numeric(as.character(df[["new_target"]]))
df[["new_target_expected"]] <- c(6, 0, 4, 2, 5, 3, 4, 1, 0)

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

We could remove the unused levels with droplevels and coerce the factor to integer. Indexing in R starts from 1, so subtract 1 to make the values start from 0.

library(data.table)
df[, (targetVariable) := as.integer(droplevels(a))-1]

-output

> df
   categorical_variable categorical_variable_2 target_variable
1:                 rock                   blue               6
2:                indie                  green               0
3:                 rock                    red               4
4:                 rock                    red               2
5:                  pop                   blue               5
6:                indie                    red               3
7:                  pop                  green               4
8:                 rock                   blue               1
9:                  pop                  green               0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading