NotFittedError: This DecisionTreeClassifier instance is not fitted yet

January 26, 2022

Am new to ML and trying to run a decision tree based model

I tried the below

X = df[['Quantity']]
y = df[['label']]
params = {'max_depth':[2,3,4], 'min_samples_split':[2,3,5,10]}
clf_dt = DecisionTreeClassifier()
clf = GridSearchCV(clf_dt, param_grid=params, scoring='f1')
clf.fit(X, y)
clf_dt = DecisionTreeClassifier(clf.best_params_)

And got the warning mentioned here

FutureWarning: Pass criterion={'max_depth': 2, 'min_samples_split': 2} as keyword args. From version 1.0 (renaming of 0.25) passing these as positional arguments will result in an error
  warnings.warn(f"Pass {args_msg} as keyword args. From version "

Later, I tried running the below and got an error (but I already fit the model using .fit())

from sklearn import tree
tree.plot_tree(clf_dt, filled=True, feature_names = list(X.columns), class_names=['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'])

NotFittedError: This DecisionTreeClassifier instance is not fitted yet. Call 
'fit' with appropriate arguments before using this estimator.

Can help me with this on how can I fix this error?

>Solution :

So there are two problems you are facing.

Firstly

Referring to

FutureWarning: Pass criterion={‘max_depth’: 2, ‘min_samples_split’: 2} as keyword args. From version 1.0 (renaming of 0.25) passing these as positional arguments will result in an error

You should use dictionary unpacking using the ** operator:

clf = GridSearchCV(clf_dt, param_grid=**params, scoring='f1')

Or just call the dict class constructor when creating params:

params = dict(max_depth=[2,3,4], min_samples_split=[2,3,5,10])

Secondly

Referring to

NotFittedError: This DecisionTreeClassifier instance is not fitted yet. Call ‘fit’ with appropriate arguments before using this estimator.

Here you can learn about the mandatory fitting step in sklearn. But as you said, you just did so in your first code example. Your problem is that using

clf_dt = DecisionTreeClassifier(clf.best_params_)

You instatiate a new DecisionTreeClassifier class which is therefore not fitted when you call

tree.plot_tree(clf_dt ...)

When you call

clf = GridSearchCV(clf_dt, param_grid=params, scoring='f1')

sklearn automatically assigns the best estimator to clf in your case. So just use this variable 🙂
The following step clf_dt = DecisionTreeClassifier(clf.best_params_) isn’t necessary.