Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

can someone explain to me why the value of split is false in the test set?

can someone explain to me why the value of split is false in the test set?

split = sample.split(dataset$Salary, SplitRatio = 2/3)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I assume you got this code from some kind of caTools documentation? I recommend trying to run the first line of code and it should start to make sense.

Basically what caTools::sample.split does is create a random vector of length nrow(x) with TRUEs and FALSEs, in the given ratio. Let’s take the iris dataset for example (which has 150 rows):

split = sample.split(iris$Sepal.Length, SplitRatio = 2/3)

The result will be a 150 item vector with 2/3 TRUE and 1/3 FALSE.

Next you use the subset function to extract all the rows i from iris where split[i] == TRUE to create the training set and use all the rows i from iris where split[i] == FALSE to create the test set.

That is why you use split == TRUE in the training set and split == FALSE in the test set

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading