Conditionally selecting observations to sum across rows

Say I have a dataset that looks like:

set.seed(123)
data <- data.frame(var1_zscore = rnorm(100),
             var2_zscore = rnorm(100),
             var3_zscore = rnorm(100),
             var4_zscore = rnorm(100),
             var5_zcore = rnorm(100))

I want to sum across each row based on conditions that are specific to each variable. For instance, if var1_zscore is <= -1, then add the absolute value of var1_zscore to the row sum. If var2_zscore is <= -1 | >= 1, then add the absolute value of var2_zscore to the row sum. If var3_zscore is >= 1, then add the absolute value of var3_zscore to the row sum. If var4_zscore is >= 1, then add the absolute value of var4_zscore to the row sum. If var5_zscore is <= -1 | >= 1, then add the absolute value of var5_zscore to the row sum.

My desired output is a column called row_sum that has, for each row, the sum of the absolute values of columns var1_zscore : var5_zscore. For instance, the first row of data look like this:

var1_zscore var2_zscore var3_zscore var4_zscore var5_zscore
-0.560475647 -0.71040656 2.19881035 -0.71524219 -0.07355602

So the first row of data$row_sum would be 2.19881035.

I tried doing something like this:

data$row_sum<- rowSums(abs(data[,c((which(data$var1_zscore <= -1)),
                                    (which(data$var2_zscore <= -1 | data$var2_zscore >= 1)),
                                    (which(data$var3_zscore >= 1)),
                                    (which(data$var4_zscore >= 1)),
                                    (which(data$var5_zscore <= -1 | data$var5_zscore >= 1))
                                    )], na.rm = TRUE))

But I get ERROR: Can’t subset columns past the end. (And then it tells me which locations in my data don’t exist).

I think the problem is that I shouldn’t be using the which function, but I’m not sure what else to use here? Any help is greatly appreciated. Thank you so much!

>Solution :

Here is a solution that uses rowSums on across:

library(dplyr)

lf <- list(
  \(x) ifelse({x} <= -1, abs({x}), 0),
  \(x) ifelse({x} <= -1 | {x} >= 1, abs({x}), 0),
  \(x) ifelse({x} >= 1, abs({x}), 0),
  \(x) ifelse({x} >= 1, abs({x}), 0),
  \(x) ifelse({x} <= -1 | {x} >= 1, abs({x}), 0)
) %>% 
  setNames(names(data))

data %>% 
  mutate(row_sums = rowSums(across(all_of(names(lf)), ~ lf[[cur_column()]](.))))

across applies a different function to each column that is set via lf and called using cur_column(). The output of across is a data frame of these columns with the functions applied. Then we simply take the row sums.

Output

     var1_zscore var2_zscore var3_zscore var4_zscore  var5_zcore row_sums
1   -0.560475647 -0.71040656  2.19881035 -0.71524219 -0.07355602 2.198810
2   -0.230177489  0.25688371  1.31241298 -0.75268897 -1.16865142 2.481064
3    1.558708314 -0.24669188 -0.26514506 -0.93853870 -0.63474826 0.000000
4    0.070508391 -0.34754260  0.54319406 -1.05251328 -0.02884155 0.000000
5    0.129287735 -0.95161857 -0.41433995 -0.43715953  0.67069597 0.000000
6    1.715064987 -0.04502772 -0.47624689  0.33117917 -1.65054654 1.650547
7    0.460916206 -0.78490447 -0.78860284 -2.01421050 -0.34975424 0.000000
8   -1.265061235 -1.66794194 -0.59461727  0.21198043  0.75640644 2.933003
9   -0.686852852 -0.38022652  1.65090747  1.23667505 -0.53880916 2.887583
10  -0.445661970  0.91899661 -0.05402813  2.03757402  0.22729192 2.037574

Leave a Reply