how to filter rows that have only 2 or more columns equal to 1?

I have a data frame with 5 columns x1x5. All of them are dummy variables. I want to filter rows only when 2 or more columns are equal to 1. For example, from df, only rows 1, 4 and 10 would be selected because:

  • row 1, x2 = 1, x3 = 1, x4 = 1
  • row 4, x1 = 1, x3 = 1,
  • row 10, x2 = 1, x5 = 1
  • the remaining rows are filtered out because either none or only one column is equal to 1.

Is there a way to achieve this using dplyr::filter?

data:

set.seed(123)

df <- data.frame(
  x1 = sample(c(1,0), size = 10, replace = T, prob = c(0.2, 0.8)),
  x2 = sample(c(1,0), size = 10, replace = T, prob = c(0.2, 0.8)),
  x3 = sample(c(1,0), size = 10, replace = T, prob = c(0.2, 0.8)),
  x4 = sample(c(1,0), size = 10, replace = T, prob = c(0.2, 0.8)),
  x5 = sample(c(1,0), size = 10, replace = T, prob = c(0.2, 0.8))
)

> df
   x1 x2 x3 x4 x5
1   0  1  1  1  0
2   0  0  0  1  0
3   0  0  0  0  0
4   1  0  1  0  0
5   1  0  0  0  0
6   0  1  0  0  0
7   0  0  0  0  0
8   1  0  0  0  0
9   0  0  0  0  0
10  0  1  0  0  1

What I want:

  x1 x2 x3 x4 x5
1  0  1  1  1  0
2  1  0  1  0  0
3  0  1  0  0  1

>Solution :

> df[rowSums(df)>=2, ]
   x1 x2 x3 x4 x5
1   0  1  1  1  0
4   1  0  1  0  0
10  0  1  0  0  1

Leave a Reply