Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Smooth way to calculate index based on several variable comparisons in base R

Example data to copy

df <- data.frame(
  AA = c(100, 200, 300, 400), 
  X1 = c(2, 1, 3, 1),
  X2 = c(1, 3, 4, 1)
)

Based on the index of AA, and it’s values, I would like to calculate the sum of indicators based on the condition df$AA[i] > df[df$X1[i], c('AA')] (here for X1) for every row on a fluctuating number of variables.

My probably naive approach is to use a for-loop, which works perfectly for a fixed number of variables (columns), in the given example X1, X2. My problem is that I do not know the number of variables beforehand. Theoretically, any number 1, 2, 3, … is possibly.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

for (i in 1:nrow(df)) {
  df$index[i] <- sum(df$AA[i] > df[df$X1[i], c('AA')],
                     df$AA[i] > df[df$X2[i], c('AA')])
}

Which gives the desired output for a fixed number of variables X1, X2:

df
#>    AA X1 X2 index
#> 1 100  2  1     0
#> 2 200  1  3     1
#> 3 300  3  4     0
#> 4 400  1  1     2

Is there a smooth base R approach which translates my approach to a flexible number of variables X1, …, Xn?

Note, the reason why I am interested in a base R approach is my aim to extend an existing package, which is fully written in base R. So I would like to keep it like that.
Loops or *apply-family approaches are both very welcome.
I am aware of the fact that operations on dataframes are often considered to be slower. Since all variables AA, X1, ... are of the same length, a solution which does not rely on a dataframe structure would also be great!

Created on 2022-04-06 by the reprex package (v2.0.1)

>Solution :

You don’t need to loop through rows. You can use Reduce.

Reduce(`+`, lapply(df[-1], function(x) df$AA > df$AA[x]))
#> [1] 0 1 0 2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading