I am using the data set mtcars as my example. The goal is to:
- Step 1: Use a loop to run regressions with changing outcome, while the independent variables stay the same for each model.
- Step 2: Transform the residuals from each model in step 1
My code:
library(tidyverse)
data("mtcars")
# Plan:
# Step 1: Outcome = cyl + disp + hp + drat
# Step 2: Transform residual from step 1
# The outcomes are all the other coloumns
outcome = colnames(mtcars[, -c(2:5)])
for (i in outcome){
# Step 1: Run the model using a loop with changing outcome
formula = as.formula(paste0(i, "~ cyl + disp + hp + drat"))
model = lm(formula, data = mtcars, na.action = na.exclude)
# Save the residuals from each model as new columns with the suffix '.res'
mtcars[, paste0(i, ".res")] = residuals(model)
# Step 2: Transform the residuals and save them as new columns with the suffix '.invn'
mtcars[, paste0(i, ".invn")] =
qnorm((rank(mtcars[,get(paste0(i,".res"))],na.last="keep")-0.5)/sum(!is.na(mtcars[,get(paste0(i,".res"))])))
}
However, I am getting an error Error in get(paste0(i, ".res")) : object 'mpg.res' not found and this is from step 2.
- The reason I think is because when indexing a column from a data set using
[], the column name has to be put in quotes. So, if I were to putmtcars[, 'mpg.res']I would have not received this error. - Nonetheless, the problem is that the column names are changing depending on the
iso I can’t putpaste0(i, ".res")in quotes. - In summary, my question is: How to index a newly created column when the column name is part of the loop? I tried
eval(parse())but it didn’t work.
PS: I know I can use purrr::map or apply to make my life easier, but I would really like to learn how to solve this problem when using a loop.
>Solution :
Simply remove the get()
# Step 2: Transform the residuals and save them as new columns with the suffix '.invn'
mtcars[, paste0(i, ".invn")] =
qnorm((rank(mtcars[,paste0(i,".res")],na.last="keep")-0.5)/sum(!is.na(mtcars[,paste0(i,".res")])))