Home Keep model made with TFIDF for predicting new content using Scikit for Python

Questions

Keep model made with TFIDF for predicting new content using Scikit for Python

May 26, 2022

this is a sentiment analysis model made with tf-idf for feature extraction
i want to know how can i save this model and reuse it.
i tried saving it this way but when i load it , do same pre-processing on the test text and fit_transform on it it gave an error that the model expected X numbers of features but got Y

this is how i saved it

filename = "model.joblib"
joblib.dump(model, filename)

and this is the code for my tf-idf model

import pandas as pd
import re
import nltk
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
nltk.download('stopwords')
from nltk.corpus import stopwords

processed_text = ['List of pre-processed text'] 
y = ['List of labels']
tfidfconverter = TfidfVectorizer(max_features=10000, min_df=5, max_df=0.7, stop_words=stopwords.words('english'))
X = tfidfconverter.fit_transform(processed_text).toarray()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

text_classifier = BernoulliNB()
text_classifier.fit(X_train, y_train)

predictions = text_classifier.predict(X_test)
print(confusion_matrix(y_test, predictions))
print(classification_report(y_test, predictions))
print(accuracy_score(y_test, predictions))

>Solution :

You can first fit tfidf to your training set using:

tfidfconverter = TfidfVectorizer(max_features=10000, min_df=5, max_df=0.7, stop_words=stopwords.words('english'))
tfidf_obj = tfidfconverter.fit(processed_text)

Then find a way to store the tfidf_obj for instance using pickle or joblib e.g:

joblib.dump(tfidf_obj, filename)

Then load the saved tfidf_obj and apply transform only on your test set

loaded_tfidf = joblib.load(filename)
test_new = loaded_tfidf.transform(X_test)

byMR

Published May 26, 2022

Add a comment

What is the relationship between the product function and the concept of permutations with repetitions?

byMR

May 26, 2022

Questions

RegEx- first character should not contain special characters and subsequent characters should not contain few special characters

byMR

May 26, 2022

Questions

Go, pgx: SELECT query returns only one row

byMR

May 26, 2022

Questions

How do I make an error handler with flask

byMR

May 26, 2022

Questions

Use of None and self keywords in method construction

byMR

May 26, 2022

Questions

Multiply Array position Swift

byMR

May 26, 2022

Keep model made with TFIDF for predicting new content using Scikit for Python

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

What is the relationship between the product function and the concept of permutations with repetitions?

RegEx- first character should not contain special characters and subsequent characters should not contain few special characters

Go, pgx: SELECT query returns only one row

How do I make an error handler with flask

Use of None and self keywords in method construction

Multiply Array position Swift

Keep Up to Date with the Most Important News

Keep model made with TFIDF for predicting new content using Scikit for Python

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

What is the relationship between the product function and the concept of permutations with repetitions?

RegEx- first character should not contain special characters and subsequent characters should not contain few special characters

Go, pgx: SELECT query returns only one row

How do I make an error handler with flask

Use of None and self keywords in method construction

Multiply Array position Swift

Discover more from Dev solutions