I need to summarise some data.frame containing both numeric and character values. I need to sum the numerical features and to keep the most frequent character value per group at the same time. Ideally, it should be with dplyr but that’s not mandatory.
Here is a reproducible example :
have <- data.frame("id" = c(1,1,1,2,2,2),"a" = c("dog","dog","cat","duck","duck","horse"),"value" = c(1,2,3,4,5,6))
need <- data.frame("id" = c(1,2), "a" = c("dog","duck"), "value" = c(6,15))
Any idea ?
>Solution :
You could group_by
and sum
for total value and use table
with which.max
and names
to select the character values that frequent the most like this:
library(dplyr)
have %>%
group_by(id) %>%
summarise(a = names(which.max(table(a))),
value = sum(value))
#> # A tibble: 2 × 3
#> id a value
#> <dbl> <chr> <dbl>
#> 1 1 dog 6
#> 2 2 duck 15
Created on 2023-03-10 with reprex v2.0.2