Am new to ML and trying to run a decision tree based model
I tried the below
X = df[['Quantity']]
y = df[['label']]
params = {'max_depth':[2,3,4], 'min_samples_split':[2,3,5,10]}
clf_dt = DecisionTreeClassifier()
clf = GridSearchCV(clf_dt, param_grid=params, scoring='f1')
clf.fit(X, y)
clf_dt = DecisionTreeClassifier(clf.best_params_)
And got the warning mentioned here
FutureWarning: Pass criterion={'max_depth': 2, 'min_samples_split': 2} as keyword args. From version 1.0 (renaming of 0.25) passing these as positional arguments will result in an error
warnings.warn(f"Pass {args_msg} as keyword args. From version "
Later, I tried running the below and got an error (but I already fit the model using .fit())
from sklearn import tree
tree.plot_tree(clf_dt, filled=True, feature_names = list(X.columns), class_names=['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'])
NotFittedError: This DecisionTreeClassifier instance is not fitted yet. Call
'fit' with appropriate arguments before using this estimator.
Can help me with this on how can I fix this error?
>Solution :
So there are two problems you are facing.
Firstly
Referring to
FutureWarning: Pass criterion={‘max_depth’: 2, ‘min_samples_split’: 2} as keyword args. From version 1.0 (renaming of 0.25) passing these as positional arguments will result in an error
You should use dictionary unpacking using the ** operator:
clf = GridSearchCV(clf_dt, param_grid=**params, scoring='f1')
Or just call the dict class constructor when creating params:
params = dict(max_depth=[2,3,4], min_samples_split=[2,3,5,10])
Secondly
Referring to
NotFittedError: This DecisionTreeClassifier instance is not fitted yet. Call ‘fit’ with appropriate arguments before using this estimator.
Here you can learn about the mandatory fitting step in sklearn. But as you said, you just did so in your first code example. Your problem is that using
clf_dt = DecisionTreeClassifier(clf.best_params_)
You instatiate a new DecisionTreeClassifier class which is therefore not fitted when you call
tree.plot_tree(clf_dt ...)
When you call
clf = GridSearchCV(clf_dt, param_grid=params, scoring='f1')
sklearn automatically assigns the best estimator to clf in your case. So just use this variable 🙂
The following step clf_dt = DecisionTreeClassifier(clf.best_params_) isn’t necessary.