Is there a way I run this script?

May 11, 2022

I have a large dataset that I intend generating a sample of 10% from it to run my machine learning model 20 times. To test how it will work, I decided to use iris dataset to try it. First, I split the dataset into training and testing dataset and then used a While loop to try a simple loop but it doesn’t seem to work as I got an error message. Please is there something I missed out?

      ### partitioning dataset

      part <- sample(1:150, size = 100, replace = F)
      training <- iris[part,]
      testing <- iris[-part,]

      ## using a loop 
      n <-1
      while (n<6) {
            Train(n)<-training[sample(1:100,0.3*nrow(training), replace = F),]
            fit <- randomForest(Species~., data = Train(n))
            pred <- predict(fit, testing)
            confusionMatrix(pred, testing$Species))
            n <-n+1
      }

The error message I got is

      Error: unexpected '}' in "}"

>Solution :

Here is the loop corrected and tested.

suppressPackageStartupMessages({
  library(randomForest)
  library(caret)
})

set.seed(2022)
part <- sample(1:150, size = 100, replace = FALSE)
training <- iris[part,]
testing <- iris[-part,]

## using a loop 
result <- vector("list", 6L)
n <- 1L
while(n < 6L) {
  Train <- training[sample(1:100, 0.3*nrow(training), replace = FALSE), ]
  fit <- randomForest(Species ~ ., data = Train)
  pred <- predict(fit, testing)
  result[[n]] <- confusionMatrix(pred, testing$Species)
  n <- n + 1L
}

## see the first result
result[[1]]
#> Confusion Matrix and Statistics
#> 
#>             Reference
#> Prediction   setosa versicolor virginica
#>   setosa         16          0         0
#>   versicolor      0         11         1
#>   virginica       0          3        19
#> 
#> Overall Statistics
#>                                           
#>                Accuracy : 0.92            
#>                  95% CI : (0.8077, 0.9778)
#>     No Information Rate : 0.4             
#>     P-Value [Acc > NIR] : 1.565e-14       
#>                                           
#>                   Kappa : 0.8778          
#>                                           
#>  Mcnemar's Test P-Value : NA              
#> 
#> Statistics by Class:
#> 
#>                      Class: setosa Class: versicolor Class: virginica
#> Sensitivity                   1.00            0.7857           0.9500
#> Specificity                   1.00            0.9722           0.9000
#> Pos Pred Value                1.00            0.9167           0.8636
#> Neg Pred Value                1.00            0.9211           0.9643
#> Prevalence                    0.32            0.2800           0.4000
#> Detection Rate                0.32            0.2200           0.3800
#> Detection Prevalence          0.32            0.2400           0.4400
#> Balanced Accuracy             1.00            0.8790           0.9250

^{Created on 2022-05-11 by the reprex package (v2.0.1)}

There’s nothing to gain with a while loop versus a for loop, you are manually incrementing n and that’s what for loops are meant for.

The equivalent for loop is the following.

result <- vector("list", 6L)
for(n in 1:6) {
  Train <- training[sample(1:100, 0.3*nrow(training), replace = FALSE), ]
  fit <- randomForest(Species ~ ., data = Train)
  pred <- predict(fit, testing)
  result[[n]] <- confusionMatrix(pred, testing$Species)
}