I have a dataframe with groups of Sequences:
df <- data.frame(
ID = letters[1:13],
Sequ = c(NA,1,1,1,2,3,3,3,NA,NA,4,4,4,4)
)
I want to filter groups that have more than a critical number n of members; let’s suppose that critical number n is 3. This attempt only selects the 4th member row but not the Sequence as a whole:
df %>%
group_by(Sequ) %>%
filter(row_number() > 3)
# A tibble: 1 × 2
# Groups: Sequ [1]
ID Sequ
<chr> <dbl>
1 n 4
So how can I get this desired output, ideally with ‘dplyr` but other solutions are welcome as well:
df
ID Sequ
1 k 4
2 l 4
3 m 4
4 n 4
>Solution :
You can use the following code that first removes NA and then group_by the Sequ and filter groups with more than 3 members:
df <- data.frame(
ID = letters[1:14],
Sequ = c(NA,1,1,1,2,3,3,3,NA,NA,4,4,4, 4)
)
library(dplyr)
df %>%
na.omit() %>%
group_by(Sequ) %>%
filter(n() > 3)
#> # A tibble: 4 × 2
#> # Groups: Sequ [1]
#> ID Sequ
#> <chr> <dbl>
#> 1 k 4
#> 2 l 4
#> 3 m 4
#> 4 n 4
Created on 2022-07-31 by the reprex package (v2.0.1)
Old answer
You can use the following code:
df <- data.frame(
ID = letters[1:14],
Sequ = c(NA,1,1,1,2,3,3,3,NA,NA,4,4,4, 4)
)
library(dplyr)
df %>%
group_by(Sequ) %>%
filter(Sequ > 3)
#> # A tibble: 4 × 2
#> # Groups: Sequ [1]
#> ID Sequ
#> <chr> <dbl>
#> 1 k 4
#> 2 l 4
#> 3 m 4
#> 4 n 4
Created on 2022-07-31 by the reprex package (v2.0.1)