Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Using R dplyr mutate to create a factor column using pre-declared levels

I’ve written R code to produce a periodic report that requires re-ordering of Week numbers such that I can filter and order by the most recent 10 weeks. To prevent errors and minimize hard-coded values, I prefer to declare this week order at the top of the script that sources the other several scripts used. Thus, I would like to define an ordered factor list and then use it to order the week number column later. RepEx below, but generally I am reordering all 52 weeks such that the most recent 10-week-period is last/largest, e.g. new_levels <- factor(1:52, levels = c(29:52, 1:28), ordered=TRUE).

Side note: any advise on how better to handle grabbing the most recent (not necessarily greatest) 10-week period is welcomed. My struggle in the past is due to the roll-over near the end of the year (51, 52, 1, 2, 3,…).

Example:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

new_levels <- factor(1:10, levels = c(8:10, 1:7), ordered=TRUE)

data <- tibble(Week = 1:10, ID = c("A","A","B","B","C","A","D","B","D","A"))

data <- data %>% mutate(Week2 = factor(Week, levels = new_levels, ordered = TRUE)) %>% arrange(Week2)

The ordered factor (new_levels) appears to be correct, but the behavior of arrange() and str() show that the ordering I want is not happening:

> new_levels
 [1] 1  2  3  4  5  6  7  8  9  10
Levels: 8 < 9 < 10 < 1 < 2 < 3 < 4 < 5 < 6 < 7
> data
# A tibble: 10 × 3
    Week ID    Week2
   <int> <chr> <ord>
 1     1 A     1    
 2     2 A     2    
 3     3 B     3    
 4     4 B     4    
 5     5 C     5    
 6     6 A     6    
 7     7 D     7    
 8     8 B     8    
 9     9 D     9    
10    10 A     10   
> str(data)
tibble [10 × 3] (S3: tbl_df/tbl/data.frame)
 $ Week : int [1:10] 1 2 3 4 5 6 7 8 9 10
 $ ID   : chr [1:10] "A" "A" "B" "B" ...
 $ Week2: Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 1 2 3 4 5 6 7 8 9 10

Thank you!

>Solution :

If you look closer at your output, you will see you are not doing what you are expecting:

data %>% 
  mutate(Week2 = factor(Week, levels = new_levels, ordered = TRUE)) %>% 
  pull(Week2)
#  [1] 1  2  3  4  5  6  7  8  9  10
# Levels: 1 < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9 < 10

This shows that arrange is working as expected. The issue comes from the fact that you are assigning levels = new_levels. What is the value of new_levels?

new_levels
#  [1] 1  2  3  4  5  6  7  8  9  10
# Levels: 8 < 9 < 10 < 1 < 2 < 3 < 4 < 5 < 6 < 7

In this case it is a sequence of 1:10. What you want is to assign the levels of new_levels to the levels of your new variable:

data %>% 
  mutate(Week2 = factor(Week, levels = levels(new_levels), ordered = TRUE)) %>% 
  arrange(Week2)
#     Week ID    Week2
#    <int> <chr> <ord>
#  1     8 B     8    
#  2     9 D     9    
#  3    10 A     10   
#  4     1 A     1    
#  5     2 A     2    
#  6     3 B     3    
#  7     4 B     4    
#  8     5 C     5    
#  9     6 A     6    
# 10     7 D     7    
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading