Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5

I’m trying to use Grid Serach for Random Forest on a data frame. The code is below:

# Standardization
x=df.iloc[:,:-1]
y=df.iloc[:,-1]
x_cols=x.columns
# Splitting the dataset into the Training set and Test set
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)

x = StandardScaler().fit_transform(x)
print(pd.DataFrame(x).head())

# Random Forest 
from sklearn.ensemble import RandomForestClassifier
rfc=RandomForestClassifier(random_state=42)
param_grid = { 'n_estimators':[100,200,300],'min_samples_split':[2,3,4,5],'max_depth':[4,5,6],
              'criterion':['gini', 'entropy']}
CV_rfc = GridSearchCV(estimator=rfc, param_grid=param_grid, cv= 5)
CV_rfc.fit(x, y)

print(CV_rfc.best_params_)

It’s giving me the following error:

UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
% (min_groups, self.n_splits)), UserWarning)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Can anyone please help me to resolve the error so that I could get right paraments for Randon Forest?

>Solution :

According to the GridSearchCV documentation:

For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used.

Since you asked for 5 splits, this means that all classes represented in y need to be represented at least 5 times for them to exist in all splits. If you do not want to use stratified cross-validation, you can use cv=KFold(5) instead, which will create 5 groups without stratification.

Here is an example of the use of KFold splitting in GridSearchCV, from the Scikit Learn documentation.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading