Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R: Count percentage of observations that switch value

I have a dataset that has two columns. One column indicates the group and each group has only two rows. The second column represents the category. Now I would like to count the percentage of each group not having the same category. So in row 1 and 2, the Category is not the same while in row 3 and 4 it is the same. In the provided data, I would get a percentage of 66.66% as four times the Category changes while it stays the same for two groups.

This is my data:

structure(list(Group = c("A", "A", "B", "B", "C", "C", "D", "D", 
"E", "E", "F", "F"), Category = c(1L, 2L, 3L, 3L, 5L, 6L, 7L, 
7L, 7L, 6L, 5L, 4L)), class = "data.frame", row.names = c(NA, 
-12L))

I have tried the following so far:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Data <- Data %>%
  group_by(Group) %>%
  count(n())

But I don’t now how to write the code in the last line to get my desired percentage. Could someone help me here?

>Solution :

A base solution with tapply():

mean(with(df, tapply(Category, Group, \(x) length(unique(x)))) > 1)

# [1] 0.6666667

With dplyr, you could use n_distinct() to count the number of unique values.

library(dplyr)

df %>%
  group_by(Group) %>%
  summarise(N = n_distinct(Category)) %>%
  summarise(Percent = mean(N > 1))

# # A tibble: 1 × 1
#   Percent
#     <dbl>
# 1   0.667
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading