In this example dataset, I have 3 groups with 2 different views and 2 different types. Here is what I want to achieve:
- Determine whether the two types within a group match (Y or N)
- If the types do not match, I want to select the row where Type = Y
- If the types DO match, remove the duplicate so that I end up with a data frame with only one
entry per group (Type = Y or Type = N)
Here is my example data:
structure(list(grp = c("1", "1", "2", "2", "3", "3"), view = c("A",
"B", "A", "B", "A", "B"), type = c("Y", "N", "Y", "Y", "N", "N"
)), class = "data.frame", row.names = c(NA, -6L))
I want the resulting dataset to look like this:
structure(list(grp = c("1", "2", "3"), view = c("A", "A", "A"
), type = c("Y", "Y", "N")), class = "data.frame", row.names = c(NA,
-3L))
Any help would be greatly appreciated.
Thank you so so much!
>Solution :
Using dplyr:
d1 %>%
group_by(grp) %>%
filter(
n_distinct(type) == 1 & row_number() == 1 |
n_distinct(type) == 2 & type == 'Y'
) %>% ungroup()
Gives:
# A tibble: 3 × 3
grp view type
<chr> <chr> <chr>
1 1 A Y
2 2 A Y
3 3 A N
Read as, within each group, keep the rows that satisfy either of these criteria:
- There is one type in this group, and this row is the first row (this is arbitrary, you didn’t specify which row to keep in this case)
- There are two types in this group, and this row has type Y