Is it possible to keep and filter out duplicates within rows?
Here is dummy data:
a <- data.frame(c('a1', 'a1', 'a1', 'a2', 'a3', 'a3'),
c(1, 2, 3, 1, 2, 3),
stringsAsFactors = FALSE)
a
colnames(a) <- c('id', 'number')
a
# id number
# 1 a1 1
# 2 a1 2
# 3 a1 3
# 4 a2 1
# 5 a3 2
# 6 a3 3
#'Expected Result
# id number
# 1 a1 1
# 2 a1 2
# 3 a1 3
# 5 a3 2
# 6 a3 3
As you can see, Not duplicated rows are removed from the "id" variable.
And can we adjust filtering? For example: keep and filter 3 or more duplicates within the "id" variable.
Is it achievable? dplyr approach will be helpful.
Thank you.
>Solution :
subset(a, duplicated(id)|duplicated(id, fromLast = TRUE))
id number
1 a1 1
2 a1 2
3 a1 3
5 a3 2
6 a3 3
if you are using filter:
filter(a, duplicated(id)|duplicated(id, fromLast = TRUE))
or even:
a %>%
group_by(id) %>%
filter(n() > 1)