I wanted to change the column type to category with the following code: df["Geography"] = df["Geography"].astype("category") Then, use random forest algorithm as following: X = df.drop(‘target’, axis = 1) y = df[‘target’] X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.15, random_state = 123,stratify=y ) forest = RandomForestClassifier(n_estimators = 500, random_state = 1) And when… Read More Problem with changing column category – could not convert string to float
I would like to set the n_jobs parameter to as close to -1 but not include all processors. If we are using a 16VCPU machine, would this be equivalent to selecting n_jobs to 15? What if we want to select all CPUS but two of the processors? n_jobs = 14? >Solution : From https://scikit-learn.org/stable/glossary.html#term-n-jobs: For… Read More n_jobs parameter for random forest to select all but one/two of the processors
I was doing something with the randomForest package in R and I came across the following and was wondering why it happened. If I create a random forest using the Boston housing data like so: library(MASS) library(randomForest) data("Boston") set.seed(101) rf <- randomForest(medv ~ ., data = Boston, importance = TRUE) Then if I want to… Read More randomForest importance measure percent MSE has different results depending on how it is called?
I am using Random Forest for binary classification. It gives me 85 % accuracy when I trained with all features(10 features). After training, I visualized the important features. It shows that 2 features are really important. So I chose anly two important features and trained RF(with same setup) but accuracy is decrease(0.70 %). Does it… Read More Got lower accuracy while training Random Forest with important features