Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Why does this dplyr group function give strange results?

When I run the below reproducible code I get the desired grouping results in the GroupRank column shown immediately beneath:

library(dplyr)

myData <- 
  data.frame(
    Element = c("A","A","B","A","C","C"),
    Group = c(0,0,0,0,1,1)
  )

myDataGroups <- myData %>%
  mutate(origOrder = row_number()) %>%  
  group_by(Element) %>% 
  mutate(ElementCnt = row_number()) %>%
  ungroup() %>%  
  mutate(Group = factor(Group, unique(Group))) %>% 
  arrange(Group) %>% 
  mutate(groupCt = cumsum(Group != lag(Group, 1, Group[[1]])) - 1L) %>%  
  group_by(Group) %>%  
  mutate(GroupRank = ElementCnt - max(0L,groupCt),
         GroupRank = if_else(as.character(Group) == "0", ElementCnt, min(GroupRank))
  )%>%  
  ungroup() %>%
  arrange(origOrder)
myDataGroups

> myDataGroups
# A tibble: 6 x 6
  Element Group origOrder ElementCnt groupCt GroupRank
  <chr>   <fct>     <int>      <int>   <int>     <int>
1 A       0             1          1      -1         1
2 A       0             2          2      -1         2
3 B       0             3          1      -1         1
4 A       0             4          3      -1         3
5 C       1             5          1       0         1
6 C       1             6          2       0         1

However when I take the line from the above code GroupRank = if_else(as.character(Group) == "0", ElementCnt, min(GroupRank)) and simply add a max function like this GroupRank = max(1L,if_else( as.character(Group) == "0", ElementCnt, min(GroupRank))) (run as 1 and 1L both ways and get the same results) I get the strange output shown below. GroupRank shouldn´t have changed from the above output:

  Element Group origOrder ElementCnt groupCt GroupRank
  <chr>   <fct>     <int>      <int>   <int>     <int>
1 A       0             1          1      -1         3
2 A       0             2          2      -1         3
3 B       0             3          1      -1         3
4 A       0             4          3      -1         3
5 C       1             5          1       0         1
6 C       1             6          2       0         1

What am I doing wrong here? Am I using max() incorrectly?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Note the difference between max() and pmax().

max(1:5, 5:1)
#> [1] 5
pmax(1:5, 5:1)
#> [1] 5 4 3 4 5

max() returns a scalar, which is why you get a constant value per group. pmax() does what you apparently expect, which is return a rowwise maximum vector.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading