Say I have a dataset that looks like:
set.seed(123)
data <- data.frame(var1_zscore = rnorm(100),
var2_zscore = rnorm(100),
var3_zscore = rnorm(100),
var4_zscore = rnorm(100),
var5_zcore = rnorm(100))
I want to sum across each row based on conditions that are specific to each variable. For instance, if var1_zscore is <= -1, then add the absolute value of var1_zscore to the row sum. If var2_zscore is <= -1 | >= 1, then add the absolute value of var2_zscore to the row sum. If var3_zscore is >= 1, then add the absolute value of var3_zscore to the row sum. If var4_zscore is >= 1, then add the absolute value of var4_zscore to the row sum. If var5_zscore is <= -1 | >= 1, then add the absolute value of var5_zscore to the row sum.
My desired output is a column called row_sum that has, for each row, the sum of the absolute values of columns var1_zscore : var5_zscore. For instance, the first row of data look like this:
var1_zscore | var2_zscore | var3_zscore | var4_zscore | var5_zscore |
---|---|---|---|---|
-0.560475647 | -0.71040656 | 2.19881035 | -0.71524219 | -0.07355602 |
So the first row of data$row_sum would be 2.19881035.
I tried doing something like this:
data$row_sum<- rowSums(abs(data[,c((which(data$var1_zscore <= -1)),
(which(data$var2_zscore <= -1 | data$var2_zscore >= 1)),
(which(data$var3_zscore >= 1)),
(which(data$var4_zscore >= 1)),
(which(data$var5_zscore <= -1 | data$var5_zscore >= 1))
)], na.rm = TRUE))
But I get ERROR: Can’t subset columns past the end. (And then it tells me which locations in my data don’t exist).
I think the problem is that I shouldn’t be using the which function, but I’m not sure what else to use here? Any help is greatly appreciated. Thank you so much!
>Solution :
Here is a solution that uses rowSums
on across
:
library(dplyr)
lf <- list(
\(x) ifelse({x} <= -1, abs({x}), 0),
\(x) ifelse({x} <= -1 | {x} >= 1, abs({x}), 0),
\(x) ifelse({x} >= 1, abs({x}), 0),
\(x) ifelse({x} >= 1, abs({x}), 0),
\(x) ifelse({x} <= -1 | {x} >= 1, abs({x}), 0)
) %>%
setNames(names(data))
data %>%
mutate(row_sums = rowSums(across(all_of(names(lf)), ~ lf[[cur_column()]](.))))
across
applies a different function to each column that is set via lf
and called using cur_column()
. The output of across
is a data frame of these columns with the functions applied. Then we simply take the row sums.
Output
var1_zscore var2_zscore var3_zscore var4_zscore var5_zcore row_sums
1 -0.560475647 -0.71040656 2.19881035 -0.71524219 -0.07355602 2.198810
2 -0.230177489 0.25688371 1.31241298 -0.75268897 -1.16865142 2.481064
3 1.558708314 -0.24669188 -0.26514506 -0.93853870 -0.63474826 0.000000
4 0.070508391 -0.34754260 0.54319406 -1.05251328 -0.02884155 0.000000
5 0.129287735 -0.95161857 -0.41433995 -0.43715953 0.67069597 0.000000
6 1.715064987 -0.04502772 -0.47624689 0.33117917 -1.65054654 1.650547
7 0.460916206 -0.78490447 -0.78860284 -2.01421050 -0.34975424 0.000000
8 -1.265061235 -1.66794194 -0.59461727 0.21198043 0.75640644 2.933003
9 -0.686852852 -0.38022652 1.65090747 1.23667505 -0.53880916 2.887583
10 -0.445661970 0.91899661 -0.05402813 2.03757402 0.22729192 2.037574