How can I rank the first 4 group of my dataframe associated to the highest value in the count column and create a 5th group summing up the remaining groups and their associated values ?
What I did so far:
dummy_dataframe <- data.frame(group = c("A", "B", "A", "A", "C", "C", "D", "E", "F", "D","G"))
df_aggregate <- aggregate(cbind(count = group) ~ group,
data = dumy_dataframe,
FUN = function(x){NROW(x)})
df_sliced <- df_aggregate %>%
arrange(desc(count)) %>%
slice(1:4)
With the code above I get a dataframe with the 4 groups associated to the highest value but how I could have a fith group summing up the value of the missing group (E, F and G) ? For instance something like this:
group count
1 A 3
2 B 1
3 C 2
4 D 2
5 others 3
>Solution :
You can run some tidyverse operations directly on your original dataframe:
library(tidyverse)
dummy_dataframe %>%
count(group) %>%
mutate(id = if_else(row_number() < 5, 1L, 2L)) %>%
group_by(id) %>%
arrange(id, -n) %>%
mutate(group = if_else(id == 2L, "others", group),
n = if_else(group == "others", sum(n), n)) %>%
ungroup() %>%
distinct() %>%
select(-id)
which gives:
# A tibble: 5 x 2
group n
<chr> <int>
1 A 3
2 C 2
3 D 2
4 B 1
5 others 3