Home R data frame: select rows that meet logical conditions over multiple columns (variables) indexed by name

Questions

R data frame: select rows that meet logical conditions over multiple columns (variables) indexed by name

November 24, 2021

Ok this example should clarify what I am looking for

set.seed(123456789)

df <- data.frame(
  x1 = sample(c(0,1), size = 10, replace = TRUE),
  x2 = sample(c(0,1), size = 10, replace = TRUE),
  z1 = sample(c(0,1), size = 10, replace = TRUE)
  )

I want to select all rows that have x1 and x2 =1. That is,

df[df$x1==1 & df$x2==1,]

which returns

   x1 x2 z1
1   1  1  1
4   1  1  1
6   1  1  1
10  1  1  0

but I want to do it in a way that scales to many x variables (e.g. x1,x2,…x40), so I would like to index the columns by "x" rather than having to write df$x1==1 & df$x2==1 &… & df$x40==1. Note that I care about having the z1 variable in the resulting data set (i.e. while the rows are selected based on the x variables, I am not looking to select the x columns only). Is it possible?

>Solution :

A possible solution, based on dplyr:

library(dplyr)

set.seed(123456789)

df <- data.frame(
  x1 = sample(c(0,1), size = 10, replace = TRUE),
  x2 = sample(c(0,1), size = 10, replace = TRUE),
  z1 = sample(c(0,1), size = 10, replace = TRUE)
)

df %>% 
  filter(across(starts_with("x"), ~ .x == 1))

#>   x1 x2 z1
#> 1  1  1  1
#> 2  1  1  1
#> 3  1  1  1
#> 4  1  1  0