mydat <- data.frame(name = c("James", "James", "James", "James", "Leila", "Leila", "Cici", "Bob", "Bob", "Holly", "Topaz", "Topaz"),
code = c(123, 928, 981, 333, 981, 928, 463, 123, 928, 981, 333, 444))
> mydat
name code
1 James 123
2 James 928
3 James 981
4 James 333
5 Leila 981
6 Leila 928
7 Cici 463
8 Bob 123
9 Bob 928
10 Holly 981
11 Topaz 333
12 Topaz 444
In mydat, each person can have multiple codes associated with it. I want to tabulate the code column and assign the person the code that is most common.
> rbind(table(mydat), Total = colSums(table(mydat)))
123 333 444 463 928 981
Bob 1 0 0 0 1 0
Cici 0 0 0 1 0 0
Holly 0 0 0 0 0 1
James 1 1 0 0 1 1
Leila 0 0 0 0 1 1
Topaz 0 1 1 0 0 0
Total 2 2 1 1 3 3
Bob has codes 123 and 928. Since 928 appears more often across mydat than 123, Bob will be assigned a code of 928.
Cici has code 463, so Cici will be assigned a code of 463.
Holly has code 981, so Holly will be assigned a code of 981.
James has codes 123, 333, 928, and 981. Since 928 and 981 both appear more frequently than 123 and 333, James will be assigned codes 928 and 981.
The final output should be:
> final_mydat
name final_code
1 Bob 928
2 Cici 463
3 Holly 981
4 James 928
5 James 981
6 Leila 928
7 Leila 981
8 Topaz 333
Is there a quick way to do this in R?
>Solution :
library(dplyr)
mydat %>%
add_count(code) %>%
group_by(name) %>%
slice_max(n) %>%
ungroup()
# # A tibble: 8 × 3
# name code n
# <chr> <dbl> <int>
# 1 Bob 928 3
# 2 Cici 463 1
# 3 Holly 981 3
# 4 James 928 3
# 5 James 981 3
# 6 Leila 981 3
# 7 Leila 928 3
# 8 Topaz 333 2
Add ... %>% select(-n) if you want to get rid of the n column.