Example data to copy
df <- data.frame(
AA = c(100, 200, 300, 400),
X1 = c(2, 1, 3, 1),
X2 = c(1, 3, 4, 1)
)
Based on the index of AA, and it’s values, I would like to calculate the sum of indicators based on the condition df$AA[i] > df[df$X1[i], c('AA')] (here for X1) for every row on a fluctuating number of variables.
My probably naive approach is to use a for-loop, which works perfectly for a fixed number of variables (columns), in the given example X1, X2. My problem is that I do not know the number of variables beforehand. Theoretically, any number 1, 2, 3, … is possibly.
for (i in 1:nrow(df)) {
df$index[i] <- sum(df$AA[i] > df[df$X1[i], c('AA')],
df$AA[i] > df[df$X2[i], c('AA')])
}
Which gives the desired output for a fixed number of variables X1, X2:
df
#> AA X1 X2 index
#> 1 100 2 1 0
#> 2 200 1 3 1
#> 3 300 3 4 0
#> 4 400 1 1 2
Is there a smooth base R approach which translates my approach to a flexible number of variables X1, …, Xn?
Note, the reason why I am interested in a base R approach is my aim to extend an existing package, which is fully written in base R. So I would like to keep it like that.
Loops or *apply-family approaches are both very welcome.
I am aware of the fact that operations on dataframes are often considered to be slower. Since all variables AA, X1, ... are of the same length, a solution which does not rely on a dataframe structure would also be great!
Created on 2022-04-06 by the reprex package (v2.0.1)
>Solution :
You don’t need to loop through rows. You can use Reduce.
Reduce(`+`, lapply(df[-1], function(x) df$AA > df$AA[x]))
#> [1] 0 1 0 2