Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Conditionally selecting observations to sum across rows

Say I have a dataset that looks like:

set.seed(123)
data <- data.frame(var1_zscore = rnorm(100),
             var2_zscore = rnorm(100),
             var3_zscore = rnorm(100),
             var4_zscore = rnorm(100),
             var5_zcore = rnorm(100))

I want to sum across each row based on conditions that are specific to each variable. For instance, if var1_zscore is <= -1, then add the absolute value of var1_zscore to the row sum. If var2_zscore is <= -1 | >= 1, then add the absolute value of var2_zscore to the row sum. If var3_zscore is >= 1, then add the absolute value of var3_zscore to the row sum. If var4_zscore is >= 1, then add the absolute value of var4_zscore to the row sum. If var5_zscore is <= -1 | >= 1, then add the absolute value of var5_zscore to the row sum.

My desired output is a column called row_sum that has, for each row, the sum of the absolute values of columns var1_zscore : var5_zscore. For instance, the first row of data look like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

var1_zscore var2_zscore var3_zscore var4_zscore var5_zscore
-0.560475647 -0.71040656 2.19881035 -0.71524219 -0.07355602

So the first row of data$row_sum would be 2.19881035.

I tried doing something like this:

data$row_sum<- rowSums(abs(data[,c((which(data$var1_zscore <= -1)),
                                    (which(data$var2_zscore <= -1 | data$var2_zscore >= 1)),
                                    (which(data$var3_zscore >= 1)),
                                    (which(data$var4_zscore >= 1)),
                                    (which(data$var5_zscore <= -1 | data$var5_zscore >= 1))
                                    )], na.rm = TRUE))

But I get ERROR: Can’t subset columns past the end. (And then it tells me which locations in my data don’t exist).

I think the problem is that I shouldn’t be using the which function, but I’m not sure what else to use here? Any help is greatly appreciated. Thank you so much!

>Solution :

Here is a solution that uses rowSums on across:

library(dplyr)

lf <- list(
  \(x) ifelse({x} <= -1, abs({x}), 0),
  \(x) ifelse({x} <= -1 | {x} >= 1, abs({x}), 0),
  \(x) ifelse({x} >= 1, abs({x}), 0),
  \(x) ifelse({x} >= 1, abs({x}), 0),
  \(x) ifelse({x} <= -1 | {x} >= 1, abs({x}), 0)
) %>% 
  setNames(names(data))

data %>% 
  mutate(row_sums = rowSums(across(all_of(names(lf)), ~ lf[[cur_column()]](.))))

across applies a different function to each column that is set via lf and called using cur_column(). The output of across is a data frame of these columns with the functions applied. Then we simply take the row sums.

Output

     var1_zscore var2_zscore var3_zscore var4_zscore  var5_zcore row_sums
1   -0.560475647 -0.71040656  2.19881035 -0.71524219 -0.07355602 2.198810
2   -0.230177489  0.25688371  1.31241298 -0.75268897 -1.16865142 2.481064
3    1.558708314 -0.24669188 -0.26514506 -0.93853870 -0.63474826 0.000000
4    0.070508391 -0.34754260  0.54319406 -1.05251328 -0.02884155 0.000000
5    0.129287735 -0.95161857 -0.41433995 -0.43715953  0.67069597 0.000000
6    1.715064987 -0.04502772 -0.47624689  0.33117917 -1.65054654 1.650547
7    0.460916206 -0.78490447 -0.78860284 -2.01421050 -0.34975424 0.000000
8   -1.265061235 -1.66794194 -0.59461727  0.21198043  0.75640644 2.933003
9   -0.686852852 -0.38022652  1.65090747  1.23667505 -0.53880916 2.887583
10  -0.445661970  0.91899661 -0.05402813  2.03757402  0.22729192 2.037574
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading