Save classifier to disk in scikit-learn in Python
Last Updated :
25 Jul, 2022
In this article, we will cover saving a Save classifier to disk in scikit-learn using Python.
We always train our models whether they are classifiers, regressors, etc. with the scikit learn library which require a considerable time to train. So we can save our trained models and then retrieve them when required. This saves us a lot of time. Serialization is the process of saving data, whereas Deserialization is the process of restoring it, we will learn to save the classifier models in two ways:
Method 1: Using Pickle
Pickle is a library provided by Python and is the standard way of saving and retrieving files from storage. It first serializes the object model and then saves it to the disk. Later we retrieve it using deserializing. Pickling is a process where a Python object hierarchy is converted into a byte stream. Unpickling is the inverse of the Pickling process where a byte stream is converted into an object hierarchy.
- dumps() – This function is called to serialize an object hierarchy.
- loads() – This function is called to de-serialize a data stream.
Syntax:
# Saving model
import pickle
pickle.dump(model, open("model_clf_pickle", 'wb'))
# load retrieve
my_model_clf = pickle.load(open("model_clf_pickle", 'rb'))
Example:
We have the iris dataset on which we trained the K Nearest Neighbor classifier. Then we saved the model using the pickle and later retrieved using the pickle and calculate the score of the classifier.
Python3
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
import pickle
# load the iris dataset as an example
iris = load_iris()
# store the feature matrix (X) and response vector (y)
X = iris.data
y = iris.target
# splitting X and y into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.4, random_state=1)
# training the model on training set
model_clf = KNeighborsClassifier(n_neighbors=3)
model_clf.fit(X_train, y_train)
# Saving classifier using pickle
pickle.dump(model_clf, open("model_clf_pickle", 'wb'))
# load classifier using pickle
my_model_clf = pickle.load(open("model_clf_pickle", 'rb'))
result_score = my_model_clf.score(X_test,y_test)
print("Score: ",result_score)
Output
Score: 0.9833333333333333
Joblib is the replacement of a pickle as it is more efficient on objects that carry large NumPy arrays. This is solely created for the purpose of saving the models and retrieving them when required These functions also accept file-like objects instead of filenames.
- joblib.dump is used to serialize an object hierarchy
- joblib.load is used to deserialize a data stream
Syntax:
# Save model
joblib.dump(model,"model_name.pkl")
# Retrieve model
joblib.load("model_name.pkl")
Example:
We have the iris dataset on which we trained the K Nearest Neighbor classifier. Then we saved the model using joblib and later retrieved using the joblib. Finally, we calculate the score of the classifier.
Python3
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
import joblib
# load the iris dataset as an example
iris = load_iris()
# store the feature matrix (X) and response vector
# (y)
X = iris.data
y = iris.target
# splitting X and y into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.4, random_state=1)
# training the model on training set
model_clf = KNeighborsClassifier(n_neighbors=3)
model_clf.fit(X_train, y_train)
# Saving classifier using joblib
joblib.dump(model_clf, 'model_clf.pkl')
# load classifier using joblib
my_model_clf = joblib.load("model_clf.pkl")
result_score = my_model_clf.score(X_test, y_test)
print("Score: ", result_score)
Output:
Score: 0.9833333333333333
Similar Reads
Save and Load Machine Learning Models in Python with scikit-learn In this article, let's learn how to save and load your machine learning model in Python with scikit-learn in this tutorial. Once we create a machine learning model, our job doesn't end there. We can save the model to use in the future. We can either use the pickle or the joblib library for this purp
4 min read
How to Save Seaborn Plot to a File in Python? Seaborn provides a way to store the final output in different desired file formats like .png, .pdf, .tiff, .eps, etc. Let us see how to save the output graph to a specific file format. Saving a Seaborn Plot to a File in Python Import the inbuilt penguins dataset from seaborn package using the inbuil
2 min read
Save Image To File in Python using Tkinter Saving an uploaded image to a local directory using Tkinter combines the graphical user interface capabilities of Tkinter with the functionality of handling and storing images in Python. In this article, we will explore the steps involved in achieving this task, leveraging Tkinter's GUI features to
4 min read
Fitting Data in Chunks vs. Fitting All at Once in scikit-learn Scikit-learn is a widely-used Python library for machine learning, offering a range of algorithms for classification, regression, clustering, and more. One of the key challenges in machine learning is handling large datasets that cannot fit into memory all at once. This article explores the strategi
5 min read
What is python scikit library? Python is known for its versatility across various domains, from web development to data science and machine learning. In machine learning, one of the go-to libraries for Python enthusiasts is Scikit-learn, often referred to as "sklearn." It's a powerhouse for creating robust machine learning models
7 min read