R: Count percentage of observations that switch value

July 7, 2022

I have a dataset that has two columns. One column indicates the group and each group has only two rows. The second column represents the category. Now I would like to count the percentage of each group not having the same category. So in row 1 and 2, the Category is not the same while in row 3 and 4 it is the same. In the provided data, I would get a percentage of 66.66% as four times the Category changes while it stays the same for two groups.

This is my data:

structure(list(Group = c("A", "A", "B", "B", "C", "C", "D", "D", 
"E", "E", "F", "F"), Category = c(1L, 2L, 3L, 3L, 5L, 6L, 7L, 
7L, 7L, 6L, 5L, 4L)), class = "data.frame", row.names = c(NA, 
-12L))

I have tried the following so far:

Data <- Data %>%
  group_by(Group) %>%
  count(n())

But I don’t now how to write the code in the last line to get my desired percentage. Could someone help me here?

>Solution :

A base solution with tapply():

mean(with(df, tapply(Category, Group, \(x) length(unique(x)))) > 1)

# [1] 0.6666667

With dplyr, you could use n_distinct() to count the number of unique values.

library(dplyr)

df %>%
  group_by(Group) %>%
  summarise(N = n_distinct(Category)) %>%
  summarise(Percent = mean(N > 1))

# # A tibble: 1 × 1
#   Percent
#     <dbl>
# 1   0.667