I have a data frame with 5 columns x1
… x5
. All of them are dummy variables. I want to filter rows only when 2 or more columns are equal to 1. For example, from df
, only rows 1, 4 and 10 would be selected because:
- row 1,
x2
= 1,x3
= 1,x4
= 1 - row 4,
x1
= 1,x3
= 1, - row 10,
x2
= 1,x5
= 1 - the remaining rows are filtered out because either none or only one column is equal to 1.
Is there a way to achieve this using dplyr::filter
?
data:
set.seed(123)
df <- data.frame(
x1 = sample(c(1,0), size = 10, replace = T, prob = c(0.2, 0.8)),
x2 = sample(c(1,0), size = 10, replace = T, prob = c(0.2, 0.8)),
x3 = sample(c(1,0), size = 10, replace = T, prob = c(0.2, 0.8)),
x4 = sample(c(1,0), size = 10, replace = T, prob = c(0.2, 0.8)),
x5 = sample(c(1,0), size = 10, replace = T, prob = c(0.2, 0.8))
)
> df
x1 x2 x3 x4 x5
1 0 1 1 1 0
2 0 0 0 1 0
3 0 0 0 0 0
4 1 0 1 0 0
5 1 0 0 0 0
6 0 1 0 0 0
7 0 0 0 0 0
8 1 0 0 0 0
9 0 0 0 0 0
10 0 1 0 0 1
What I want:
x1 x2 x3 x4 x5
1 0 1 1 1 0
2 1 0 1 0 0
3 0 1 0 0 1
>Solution :
> df[rowSums(df)>=2, ]
x1 x2 x3 x4 x5
1 0 1 1 1 0
4 1 0 1 0 0
10 0 1 0 0 1