Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Why doesn't R dplyr arrange sort properly using a vector element within a for loop

I’m having trouble getting r’s dplyr::arrange() to sort properly when used in a for loop. I found many posts discussing this issue (like ex.1 with the .by_group=TRUE and using desc() bettter, ex.2 with lists, and ex.3 with filter_all() and %in%). Yet, I’m still having a bit of trouble understanding why I can get the arrange() to work when I use the column name directly but not when I refer to its index position within a vector, which will later be used in a loop to aid data extraction from a larger dataframe.

Here is a reproducible toy data to demonstrate:

set.seed(1) 
toy <- data.frame(a=rep(sample(letters[1:5], 4, TRUE)), tf=sample(c("T","F"), 100, TRUE), n1=sample(1:100, 100, TRUE), n2=1:100)
get_it <- colnames(toy)[3:4]

My initial approach so far works with the indexed vector on the select() portion, but fails to sort on the arrange() even with the .by_group option. I also tried adding dplyr::arrange() but not change.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

j=1  # pretending this is the 1st pass in the loop
toy %>% 
  select(a, tf, get_it[j]) %>% 
  group_by(a) %>% 
  arrange(desc(get_it[j]), .by_group=TRUE)

   a     tf     n1
<chr>  <chr>  <int>
   a      T     21
   a      T     17
   a      F     87
   a      T     90
   a      T     64  

example output truncated

However, I get the intended sorted results when I switch the indexed vector in the arrange() for the same name of the column (select still works fine):

j=1  # pretending this is the 1st pass through the loop
toy %>% 
  select(a, tf, get_it[j]) %>% 
  group_by(a) %>% 
  arrange(desc(n1), .by_group=TRUE)

   a     tf     n1
<chr>  <chr>  <int>
   a      F     99
   a      F     98
   a      F     96
   a      F     95
   a      T     93  

example output truncated

Why does the second version work, but not the first? What should I change so that I can loop this through many columns?
Thanks in advance! I appreciate your time!

(minor edit to correct a typo.)

>Solution :

This is "programming with dplyr", use .data for referencing columns by a string:

toy %>% 
  select(a, tf, get_it[j]) %>% 
  group_by(a) %>% 
  arrange(desc(.data[[ get_it[j] ]]), .by_group=TRUE)
# # A tibble: 100 x 3
# # Groups:   a [3]
#    a     tf       n1
#    <chr> <chr> <int>
#  1 a     F        99
#  2 a     F        98
#  3 a     F        96
#  4 a     F        95
#  5 a     T        93
#  6 a     T        92
#  7 a     T        92
#  8 a     T        90
#  9 a     F        87
# 10 a     F        86
# # ... with 90 more rows
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading