Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Why is levels() in R not assigning the wrong level to my data?

I’m creating a function that requires users to upload a dataset with a vector of specific characters. Under the hood, I need one column that has the vector remain character, but I also need a separate column that is identical except that that it is a factor with specific levels.

When I try using levels() to assign the levels, I assumed R would match up the strings, but it’s randomly assigning the order of the levels. How do I correct this behavior? Though the specific character values will always be the same, I won’t know the order that users will upload them.

#Data to recreate the issue (note: The group and count columns are not relevant, but I kept them in case they may be related to the issue for some reason)

library(dplyr)

data <- tibble(group = factor(c(rep("A", 10), rep("B", 10), rep("C", 10), rep("D", 10)), levels = c("A", "B", "C", "D")),
                                 state = c(rep(c("Not Started", "Just Beginning",
                                                 "25% Complete", "40% Complete", "Halfway Done",
                                                 "75% Complete", "Mostly Done", "Completed",
                                                 "Follow Up", "Final Follow Up"), 4)),
                                 count = c(100, 5, 4, 445, 67, 44, 25, 877, 240, 353,
                                           48, 51, 48, 40, 141, 34, 50, 45, 34, 35,
                                           140, 5, 8, 0, 17, 42, 0, 5, 3, 75,
                                           477, 20, 59, 13, 1065, 1, 50, 353, 73, 104))

data$state_factor <- as.factor(data$state)

levels(data$state_factor) <- c("Not Started", "Just Beginning",
                                                 "25% Complete", "40% Complete", "Halfway Done",
                                                 "75% Complete", "Mostly Done", "Completed",
                                                 "Follow Up", "Final Follow Up")

head(data, 20) #Note how the state and state_factor columns are not identical

I’m flexible how I can accomplish this (i.e., is there a function in forcats I’m missing?), but it needs to have these levels in these orders.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Update:

Ok then you could use factor instead of as.factor and set levels directly:

data$state_factor <- factor(data$state, levels=c("Not Started", "Just Beginning",
                                                    "25% Complete", "40% Complete", "Halfway Done",
                                                    "75% Complete", "Mostly Done", "Completed",
                                                    "Follow Up", "Final Follow Up"))

Output:

> head(data, 20)  
# A tibble: 20 × 4
   group state           count state_factor   
   <fct> <chr>           <dbl> <fct>          
 1 A     Not Started       100 Not Started    
 2 A     Just Beginning      5 Just Beginning 
 3 A     25% Complete        4 25% Complete   
 4 A     40% Complete      445 40% Complete   
 5 A     Halfway Done       67 Halfway Done   
 6 A     75% Complete       44 75% Complete   
 7 A     Mostly Done        25 Mostly Done    
 8 A     Completed         877 Completed      
 9 A     Follow Up         240 Follow Up      
10 A     Final Follow Up   353 Final Follow Up
11 B     Not Started        48 Not Started    
12 B     Just Beginning     51 Just Beginning 
13 B     25% Complete       48 25% Complete   
14 B     40% Complete       40 40% Complete   
15 B     Halfway Done      141 Halfway Done   
16 B     75% Complete       34 75% Complete   
17 B     Mostly Done        50 Mostly Done    
18 B     Completed          45 Completed      
19 B     Follow Up          34 Follow Up      
20 B     Final Follow Up    35 Final Follow Up

Now they are not in alphabetical order:

> levels(data$state_factor)
 [1] "Not Started"     "Just Beginning"  "25% Complete"    "40% Complete"    "Halfway Done"    "75% Complete"    "Mostly Done"     "Completed"      
 [9] "Follow Up"       "Final Follow Up"
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading