I am working in R….
I have some colleges which operate from multiple sites. We know how many students there are at each site.
data <- data.frame(provider_id_num = c("1", "1", "2", "3", "3"),
postcode = c("S2 3EH", "S2 3ET", "S2 34h", "S2 rty", "B1 2eh"),
number_students = c(1, 3, 5, 2, 2))
provider_id | postcode | number_students |
---|---|---|
1 | S2 3EH | 1 |
1 | S2 3ET | 3 |
2 | S2 34h | 5 |
3 | S2 rty | 2 |
3 | B1 2eh | 2 |
For each provider, I want to keep the row with the most number of students.
However, if there is a tie, I don’t mind which row I keep, but I only want it to keep only one row.
Desired outcome:
provider_id | postcode | number_students |
---|---|---|
1 | S2 3ET | 3 |
2 | S2 34h | 5 |
3 | S2 rty | 2 |
OR:
provider_id | postcode | number_students |
---|---|---|
1 | S2 3ET | 3 |
2 | S2 34h | 5 |
3 | B1 2eh | 2 |
Does anyone have any thoughts?
>Solution :
slice_max
sounds like it could be useful for you. Here is an example
The example uses slice_min
, but with_ties
is also available for slice_max
.
# Use with_ties = FALSE to return exactly n matches
mtcars %>% slice_min(cyl, n = 1, with_ties = FALSE)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Datsun 710 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1