I have 10 datasets that are the result of multiple imputation, which i named: data1, data2, …, data10. For each of them, I want to do:
- Create a logistic regression model
- Do multiple steps which include creating a LASSO model, resampling 200 times from my imputed dataset, recreate LASSO model in each resampling, evaluate measures of performance.
I’m able to do it separately for each dataset but I was wondering if there was a way to automatically do all of the steps for each imputed dataset. Below, I included an example of all the steps I do to get results separately for each imputation.
To do it automatically, i first thought about using lapply to create regressions for every imputation:
log01.1 <- lapply(paste0("data",1:10), function(x){lrm(y ~ x1 + x2 + x3, data=eval(parse(text = x)), x=T, y=T)})
Then I wanted to use lapply again on the whole block of code below with something like :
lapply(log01.1,fun(x){*All the steps following the regression*}
But I realized it doesn’t work since lapply can only be applied to one function at a time as I understand it + at model.L1 <- glmnet(x=log01.1$x, y=log01.1$y, alpha=1, lambda=cv.glmmod$lambda.1se, family="binomial")
it wouldn’t work since my lambda would come from a list. And I can’t use lapply both on log01.1 and on cv.glmmod at the same time. Add to that the resampling with the 200 repetitons and I’m sure I would run into other problems I can’t even think of right now.
And that’s about the extent of my knowledge on lapply and other functions that could do similar things. Is there a way to take the chunk of code I wrote below and tell R to repeat it for every one of my 10 imputations and then store into separate lists the objects that would have been created? Or maybe not in lists but I would get for example App1, App2, App3, etc.?
Or am I better off just repeating it 10 times and storing the results?
log01.1 <- lrm(y ~ x1 + x2 + x3 , data=data1, x=T, y=T)})
reps <- 200;App=numeric(reps);Test=numeric(reps)
for(i in 1:reps){
#1.Construct LASSO model in sample i
cv.glmmod <- cv.glmnet(x=log01.1$x, y=log01.1$y, alpha=1, family="binomial")
model.L1 <- glmnet(x=log01.1$x, y=log01.1$y, alpha=1,
lambda=cv.glmmod$lambda.1se, family="binomial") #use optimum penalty
lp1 <- log01.1$x %*% model.L1$beta #for apparent performance
#2. Draw bootstrap sample with replacement from sample i
j <- sample(nrow(data1), replace=T) #for sample Bi
#3. Construct a model in sample Bi replaying every step that was done in the imputed sample
#I, especially model specification steps such as selection of predictors.
#Determine the bootstrap performance as the apparent performance in sample Bi.
#3 Construct LASSO model in sample i replaying every step done in imputed sample i
cv.j <- cv.glmnet (x=log01.1$x[j,], y=log01.1$y[j,], alpha = 1, family="binomial")
model.L1j <- glmnet (x=log01.1$x[j,], y=log01.1$y[j,], alpha=1,
lambda=cv.j$lambda.1se, family="binomial") #use optimum penalty for Bi
lp1j <- log01.1$x[j,] %*% model.L1j$beta #apparent performance in Bi
App[i] <- lrm.fit(y=log01.1$y[j,], x=lp1j)$stats[6] #apparent c for Bi
#4. Apply model from Bi to the original sample i without any modification to determine the test performance
lp1 <- log01.1$x %*% model.L1j$beta #Validated performance in I
Test[i] <- lrm.fit(y=log01.1$y, x=lp1)$stats[6]} #Test c in I
That is the code I would like to repeat automatically for every imputed set.
>Solution :
To apply a set of functions to multiple datasets in R, you can use the lapply function, which applies a function to each element of a list and returns the results in a list. In your case, you can use lapply to apply your regression and resampling steps to each of your 10 imputed datasets.
Here is an example of how you could use lapply to do this:
# Create a list of the imputed datasets
imputed_datasets <- list(data1, data2, ..., data10)
# Apply the regression and resampling steps to each dataset in the list
results <- lapply(imputed_datasets, function(dataset) {
# Create a logistic regression model
log01.1 <- lrm(y ~ x1 + x2 + x3, data=dataset, x=T, y=T)
# Do multiple steps which include creating a LASSO model, resampling 200 times
# from the imputed dataset, recreate LASSO model in each resampling, and evaluate
# measures of performance
reps <- 200;App=numeric(reps);Test=numeric(reps)
for(i in 1:reps){
# 1. Construct LASSO model in sample i
cv.glmmod <- cv.glmnet(x=log01.1$x, y=log01.1$y, alpha=1, family="binomial")
model.L1 <- glmnet(x=log01.1$x, y=log01.1$y, alpha=1,
lambda=cv.glmmod$lambda.1se, family="binomial") #use optimum penalty
lp1 <- log01.1$x %*% model.L1$beta #for apparent performance
# 2. Draw bootstrap sample with replacement from sample i
j <- sample(nrow(dataset), replace=T) #for sample Bi
# 3. Construct a model in sample Bi replaying every step that was done in the
# imputed sample I, especially model specification steps such as selection of
# predictors. Determine the bootstrap performance as the apparent performance in sample Bi.
# 3. Construct LASSO model in sample i replaying every step done in imputed sample i
cv.j <- cv.glmnet (x=log01.1$x[j,], y=log01.1$y[j,], alpha = 1, family="binomial")
model.L1j <- glmnet (x=log01.1$x[j,], y=log01.1$y[j,], alpha=1,
lambda=cv.j$lambda.1se, family="binomial") #use optimum penalty for Bi
lp1j <- log01.1$x[j,] %*% model.L1j$beta #apparent performance in Bi
App[i] <- lrm.fit(y=log01.1$y[j,], x=lp1j)$stats[6] #apparent performance in Bi
Test[i] <- lrm.fit(y=log01.1$y[-j,], x=lp1[-j,])$stats[6] #test performance in