Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Count unique strings that only occur in a single group based on all possible groups

I have the following df

a = data.frame(PA = c("A", "A", "A", "B", "B"), Family = c("aa", "ab", "ac", "aa", "ad"))

What I want to obtain is a count of unique ‘Family’ strings (aa, ab, ac, ad) in each PA (A or B) based on all possible PAs. For example, aa is a unique string for A and B, but since it occurs in both PAs I don’t want it. On the other hand, ab and ac are unique for PA A and only occur in PA A: that’s what I want.

Using dplyr I was doing something like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df >%> group_by(PA) %>%
summarise(count_family = n_distinct(Family))

But this only returns unique terms inside each PA — and I want unique Families that occur inside unique PAs based on all possible PAs

>Solution :

Here’s a tidyverse approach.

First remove all duplicated Family, then group_by(PA) and count.

library(tidyverse)

a %>% group_by(Family) %>% 
  filter(n() == 1) %>% 
  group_by(PA) %>%  
  summarize(count_family = n())

Output

# A tibble: 2 x 2
  PA    count_family
  <chr>        <int>
1 A                2
2 B                1

Output before summarise()

# A tibble: 3 x 2
# Groups:   Family [3]
  PA    Family
  <chr> <chr> 
1 A     ab    
2 A     ac    
3 B     ad    
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading