Home Subset in R with specific values for specific columns identified by their index number

Questions

Subset in R with specific values for specific columns identified by their index number

January 23, 2022

If I have a data frame like this:

df = data.frame(A = sample(1:5, 10, replace=T), B = sample(1:5, 10, replace=T), C = sample(1:5, 10, replace=T), D = sample(1:5, 10, replace=T), E = sample(1:5, 10, replace=T))

Giving me this:

   A B C D E
1  1 5 1 4 3
2  2 3 5 4 3
3  4 2 2 4 4
4  2 1 2 5 2
5  3 3 4 4 5
6  3 2 3 1 5
7  1 5 4 2 3
8  1 3 5 5 1
9  3 1 1 3 5
10 5 3 1 2 4

How do I get a subset that includes all the rows where the values for certain columns (B and D, say) are equal to 1, with the columns identified by their index numbers (2 and 4) rather than their names? In this case:

   A B C D E
4  2 1 2 5 2
6  3 2 3 1 5
9  3 1 1 3 5

>Solution :

df[rowSums(df[c(2,4)] == 1) > 0,]
#   A B C D E
# 4 2 1 2 5 2
# 6 3 2 3 1 5
# 9 3 1 1 3 5

You said to compare values by column index, so df[c(2,4)] or (or df[,c(2,4)]).
df[c(2,4)] == 1 returns a matrix of logicals, whether the cell’s value is equal to 1.
rowSums(.) > 0 finds those rows with at least one 1.
df[rowSums(.)>0,] selects just those rows.

Data

df <- structure(list(A = c(1L, 2L, 4L, 2L, 3L, 3L, 1L, 1L, 3L, 5L), B = c(5L, 3L, 2L, 1L, 3L, 2L, 5L, 3L, 1L, 3L), C = c(1L, 5L, 2L, 2L, 4L, 3L, 4L, 5L, 1L, 1L), D = c(4L, 4L, 4L, 5L, 4L, 1L, 2L, 5L, 3L, 2L), E = c(3L, 3L, 4L, 2L, 5L, 5L, 3L, 1L, 5L, 4L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))