How do I match rows within a group and filter the rows that do not agree?

June 6, 2023

In this example dataset, I have 3 groups with 2 different views and 2 different types. Here is what I want to achieve:

Determine whether the two types within a group match (Y or N)
If the types do not match, I want to select the row where Type = Y
If the types DO match, remove the duplicate so that I end up with a data frame with only one
entry per group (Type = Y or Type = N)

Here is my example data:

structure(list(grp = c("1", "1", "2", "2", "3", "3"), view = c("A", 
"B", "A", "B", "A", "B"), type = c("Y", "N", "Y", "Y", "N", "N"
)), class = "data.frame", row.names = c(NA, -6L))

I want the resulting dataset to look like this:

structure(list(grp = c("1", "2", "3"), view = c("A", "A", "A"
), type = c("Y", "Y", "N")), class = "data.frame", row.names = c(NA, 
-3L))

Any help would be greatly appreciated.

Thank you so so much!

>Solution :

Using dplyr:

d1 %>% 
  group_by(grp) %>% 
  filter(
    n_distinct(type) == 1 & row_number() == 1 | 
    n_distinct(type) == 2 & type == 'Y'
  ) %>% ungroup()

Gives:

# A tibble: 3 × 3
  grp   view  type 
  <chr> <chr> <chr>
1 1     A     Y    
2 2     A     Y    
3 3     A     N

Read as, within each group, keep the rows that satisfy either of these criteria:

There is one type in this group, and this row is the first row (this is arbitrary, you didn’t specify which row to keep in this case)
There are two types in this group, and this row has type Y