using ntile within a group

I’m trying calculate the quintiles within each group of a dataframe. If I do:

mtcars %>%
  group_by(gear,carb) %>%
  summarise(total = sum(wt), .groups = "keep") %>%
  mutate(rank = ntile(total,5))

all entries within the rank column are equal to 1. What am I doing wrong here?

>Solution :

Because when you group_by(gear, carb), unique combinations of these two variables are treated as a group. Since you used summrise(..., .groups = "keep"), all grouping variables in the input are preserved. In this case, there’s only one unique combinations for these two columns, and therefore, every row would be in it’s own group (note # Groups: gear, carb [11] in the tibble output). Therefore, you are calculating ntile of one element for every group, and the result will of course be 1.

If you don’t include the .groups = "keep" argument, the last grouping variable will be dropped (carb will be dropped), and you can see rank per gear (note # Groups: gear [3]).

library(dplyr)

mtcars %>%
  group_by(gear,carb) %>%
  summarise(total = sum(wt)) %>%
  mutate(rank = ntile(total, 5))

# A tibble: 11 × 4
# Groups:   gear [3]
    gear  carb total  rank
   <dbl> <dbl> <dbl> <int>
 1     3     1  9.14     1
 2     3     2 14.2      3
 3     3     3 11.6      2
 4     3     4 23.4      4
 5     4     1  8.29     1
 6     4     2 10.7      2
 7     4     4 12.4      3
 8     5     2  3.65     4
 9     5     4  3.17     2
10     5     6  2.77     1
11     5     8  3.57     3

Leave a Reply