Grid Searching From Scratch using Python
Last Updated :
21 Mar, 2024
Grid searching is a method to find the best possible combination of hyper-parameters at which the model achieves the highest accuracy. Before applying Grid Searching on any algorithm, Data is used to divided into training and validation set, a validation set is used to validate the models. A model with all possible combinations of hyperparameters is tested on the validation set to choose the best combination.
Implementation:
Grid Searching can be applied to any hyperparameters algorithm whose performance can be improved by tuning hyperparameter. For example, we can apply grid searching on K-Nearest Neighbors by validating its performance on a set of values of K in it. Same thing we can do with Logistic Regression by using a set of values of learning rate to find the best learning rate at which Logistic Regression achieves the best accuracy.
It has 8 features columns like i.e “Age”, “Glucose” e.t.c, and the target variable “Outcome” for 108 patients. So in this, we will train a Logistic Regression Classifier model to predict the presence of diabetes or not for patients with such information.
Code: Implementation of Grid Searching on Logistic Regression from Scratch
Python3
# Importing libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
# Grid Searching in Logistic Regression
class LogitRegression() :
def __init__( self, learning_rate, iterations ) :
self.learning_rate = learning_rate
self.iterations = iterations
# Function for model training
def fit( self, X, Y ) :
# no_of_training_examples, no_of_features
self.m, self.n = X.shape
# weight initialization
self.W = np.zeros( self.n )
self.b = 0
self.X = X
self.Y = Y
# gradient descent learning
for i in range( self.iterations ) :
self.update_weights()
return self
# Helper function to update weights in gradient descent
def update_weights( self ) :
A = 1 / ( 1 + np.exp( - ( self.X.dot( self.W ) + self.b ) ) )
# calculate gradients
tmp = ( A - self.Y.T )
tmp = np.reshape( tmp, self.m )
dW = np.dot( self.X.T, tmp ) / self.m
db = np.sum( tmp ) / self.m
# update weights
self.W = self.W - self.learning_rate * dW
self.b = self.b - self.learning_rate * db
return self
# Hypothetical function h( x )
def predict( self, X ) :
Z = 1 / ( 1 + np.exp( - ( X.dot( self.W ) + self.b ) ) )
Y = np.where( Z > 0.5, 1, 0 )
return Y
# Driver code
def main() :
# Importing dataset
df = pd.read_csv( "diabetes.csv" )
X = df.iloc[:,:-1].values
Y = df.iloc[:,-1:].values
# Splitting dataset into train and validation set
X_train, X_valid, Y_train, Y_valid = train_test_split(
X, Y, test_size = 1/3, random_state = 0 )
# Model training
max_accuracy = 0
# learning_rate choices
learning_rates = [ 0.1, 0.2, 0.3, 0.4, 0.5,
0.01, 0.02, 0.03, 0.04, 0.05 ]
# iterations choices
iterations = [ 100, 200, 300, 400, 500 ]
# available combination of learning_rate and iterations
parameters = []
for i in learning_rates :
for j in iterations :
parameters.append( ( i, j ) )
print("Available combinations : ", parameters )
# Applying linear searching in list of available combination
# to achieved maximum accuracy on CV set
for k in range( len( parameters ) ) :
model = LogitRegression( learning_rate = parameters[k][0],
iterations = parameters[k][1] )
model.fit( X_train, Y_train )
# Prediction on validation set
Y_pred = model.predict( X_valid )
# measure performance on validation set
correctly_classified = 0
# counter
count = 0
for count in range( np.size( Y_pred ) ) :
if Y_valid[count] == Y_pred[count] :
correctly_classified = correctly_classified + 1
curr_accuracy = ( correctly_classified / count ) * 100
if max_accuracy < curr_accuracy :
max_accuracy = curr_accuracy
print( "Maximum accuracy achieved by our model through grid searching : ", max_accuracy )
if __name__ == "__main__" :
main()
Output:
Available combinations : [(0.1, 100), (0.1, 200), (0.1, 300), (0.1, 400),
(0.1, 500), (0.2, 100), (0.2, 200), (0.2, 300), (0.2, 400), (0.2, 500),
(0.3, 100), (0.3, 200), (0.3, 300), (0.3, 400), (0.3, 500), (0.4, 100),
(0.4, 200), (0.4, 300), (0.4, 400), (0.4, 500), (0.5, 100), (0.5, 200),
(0.5, 300), (0.5, 400), (0.5, 500), (0.01, 100), (0.01, 200), (0.01, 300),
(0.01, 400), (0.01, 500), (0.02, 100), (0.02, 200), (0.02, 300), (0.02, 400),
(0.02, 500), (0.03, 100), (0.03, 200), (0.03, 300), (0.03, 400), (0.03, 500),
(0.04, 100), (0.04, 200), (0.04, 300), (0.04, 400), (0.04, 500), (0.05, 100),
(0.05, 200), (0.05, 300), (0.05, 400), (0.05, 500)]
Maximum accuracy achieved by our model through grid searching : 60.0
In the above, we applied grid searching on all possible combinations of learning rates and the number of iterations to find the peak of the model at which it achieves the highest accuracy.
Code: Implementation of Grid Searching on Logistic Regression of sklearn
Python3
# Importing Libraries
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
# Driver Code
def main() :
# Importing dataset
df = pd.read_csv( "diabetes.csv" )
X = df.iloc[:,:-1].values
Y = df.iloc[:,-1:].values
# Splitting dataset into train and test set
X_train, X_test, Y_train, Y_test = train_test_split(
X, Y, test_size = 1/3, random_state = 0 )
# Model training
max_accuracy = 0
# grid searching for learning rate
parameters = { 'C' : [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ] }
model = LogisticRegression()
grid = GridSearchCV( model, parameters )
grid.fit( X_train, Y_train )
# Prediction on test set
Y_pred = grid.predict( X_test )
# measure performance
correctly_classified = 0
# counter
count = 0
for count in range( np.size( Y_pred ) ) :
if Y_test[count] == Y_pred[count] :
correctly_classified = correctly_classified + 1
accuracy = ( correctly_classified / count ) * 100
print( "Maximum accuracy achieved by sklearn model through grid searching : ", np.round( accuracy, 2 ) )
if __name__ == "__main__" :
main()
Output:
Maximum accuracy achieved by sklearn model through grid searching : 62.86
Note: Grid Searching plays a vital role in tuning hyperparameters for the mathematically complex models.
Similar Reads
Machine Learning Tutorial Machine learning is a branch of Artificial Intelligence that focuses on developing models and algorithms that let computers learn from data without being explicitly programmed for every task. In simple words, ML teaches the systems to think and understand like humans by learning from the data.Do you
5 min read
Introduction to Machine Learning
Python for Machine Learning
Machine Learning with Python TutorialPython language is widely used in Machine Learning because it provides libraries like NumPy, Pandas, Scikit-learn, TensorFlow, and Keras. These libraries offer tools and functions essential for data manipulation, analysis, and building machine learning models. It is well-known for its readability an
5 min read
Pandas TutorialPandas is an open-source software library designed for data manipulation and analysis. It provides data structures like series and DataFrames to easily clean, transform and analyze large datasets and integrates with other Python libraries, such as NumPy and Matplotlib. It offers functions for data t
6 min read
NumPy Tutorial - Python LibraryNumPy (short for Numerical Python ) is one of the most fundamental libraries in Python for scientific computing. It provides support for large, multi-dimensional arrays and matrices along with a collection of mathematical functions to operate on arrays.At its core it introduces the ndarray (n-dimens
3 min read
Scikit Learn TutorialScikit-learn (also known as sklearn) is a widely-used open-source Python library for machine learning. It builds on other scientific libraries like NumPy, SciPy and Matplotlib to provide efficient tools for predictive data analysis and data mining.It offers a consistent and simple interface for a ra
3 min read
ML | Data Preprocessing in PythonData preprocessing is a important step in the data science transforming raw data into a clean structured format for analysis. It involves tasks like handling missing values, normalizing data and encoding variables. Mastering preprocessing in Python ensures reliable insights for accurate predictions
6 min read
EDA - Exploratory Data Analysis in PythonExploratory Data Analysis (EDA) is a important step in data analysis which focuses on understanding patterns, trends and relationships through statistical tools and visualizations. Python offers various libraries like pandas, numPy, matplotlib, seaborn and plotly which enables effective exploration
6 min read
Feature Engineering
Supervised Learning
Unsupervised Learning
Model Evaluation and Tuning
Advance Machine Learning Technique
Machine Learning Practice