Hyper Parameters
Hyper Parameters
Introduction
• Hyperparameters are essential configuration
settings that are not learned from the data but are
set prior to training a machine learning model.
• They play a critical role in the model's performance,
and selecting appropriate hyperparameters is often
a key part of the machine learning workflow.
• Various hyperparameters used in machine learning
and deep learning models are described next.
Hyperparameters
• Learning Rate (α): This hyperparameter
controls the step size during the optimization
process (e.g., gradient descent).
• A high learning rate can lead to faster
convergence but may result in overshooting
the optimal solution, while a low learning rate
might converge too slowly or get stuck in local
minima.
Hyperparameters
• Number of Epochs: An epoch is one complete
pass through the training dataset.
– The number of epochs determines how many
times the model will see the entire dataset during
training.
– It's a trade-off between ensuring the model learns
well and avoiding overfitting.
Hyperparameters
• Batch Size: During training, data is divided into
batches, and each batch is used to update the
model's parameters.
• Batch size affects training speed, memory
usage, and model convergence.
• Smaller batches provide noisier updates but
might help the model escape local minima.
Hyperparameters
• Number of Layers: In deep learning models,
the number of layers in the neural network
architecture is a hyperparameter.
• Deeper networks can capture complex
features but may suffer from vanishing
gradients or overfitting.
Hyperparameters
• Number of Neurons (Units): The number of
neurons or units in each layer is another
crucial hyperparameter.
• It influences the model's capacity to capture
information and can be determined using
techniques like cross-validation.
Hyperparameters
• Activation Functions: Choosing the
appropriate activation functions for each layer,
such as ReLU, sigmoid, or tanh, is a
hyperparameter.
• Different activation functions may be more
suitable for specific tasks or architectures.
Hyperparameters
• Regularization:
– Hyperparameters related to regularization
techniques, such as L1 and L2 regularization
strength or dropout probability, can help control
overfitting.
Hyperparameters
• Weight Initialization:
– The initial values of the model's weights are a
hyperparameter.
– Proper initialization can affect the training process
and model convergence.
Hyperparameters
• Optimizer: The choice of optimization
algorithm, like stochastic gradient descent
(SGD), Adam, or RMSprop, is a hyperparameter.
– Each optimizer has its strengths and weaknesses.
• Loss Function: The loss function, used to
measure the model's performance during
training, is a hyperparameter.
– It depends on the task (e.g., mean squared error for
regression, cross-entropy for classification).
Hyperparameters
• Learning Rate Schedule: Some models benefit
from a learning rate schedule that changes the
learning rate during training.
– Learning rate annealing or decay can be used to
fine-tune hyperparameters.
• Dropout Rate: The dropout rate in dropout
layers helps regularize neural networks.
– It's the probability of dropping out a neuron
during training.
Hyperparameters
• Early Stopping: The number of epochs to wait
without improvement in the validation loss
before stopping training is another
hyperparameter used for preventing
overfitting.
• Batch Normalization: Hyperparameters
related to batch normalization, such as
momentum and epsilon, can affect the
model's training.
Hyperparameters
• Weight Decay: This is a regularization
hyperparameter that controls the L2 penalty
applied to the model's weights during
optimization.
• Input Features: For feature engineering, you
may need to decide which features to include
or exclude from your dataset.
Hyperparameters
• Architecture Hyperparameters: In convolutional
neural networks (CNNs) and recurrent neural
networks (RNNs), architecture-specific
hyperparameters like kernel size, stride, and the
number of LSTM or GRU units need to be set.
• Ensemble Techniques: Hyperparameters for
ensemble methods, like the number of base models
and their types, must be selected.
• Random Seed: Setting a random seed ensures
reproducibility in machine learning experiments.
Hyperparameters
• Selecting the right hyperparameters can
significantly impact the model's performance and
training stability.
• Techniques like grid search, random search, and
Bayesian optimization are used to explore
hyperparameter spaces and find the best settings
for a given task.
• It often involves a trade-off between underfitting
and overfitting, making hyperparameter tuning an
iterative process in machine learning.
GridSearchCV
• In GridSearchCV approach, the machine learning model is
evaluated for a range of hyperparameter values.
• This approach is called GridSearchCV, because it searches
for the best set of hyperparameters from a grid of
hyperparameters values.
• For example, if we want to set two hyperparameters C and
Alpha of the Logistic Regression Classifier model, with
different sets of values.
• The grid search technique will construct many versions of
the model with all possible combinations of
hyperparameters and will return the best one.
GridSearchCV
• Drawback: GridSearchCV will go through all
the intermediate combinations of
hyperparameters which makes grid search
computationally very expensive.
RandomizedSearchCV
• RandomizedSearchCV solves the drawbacks of
GridSearchCV, as it goes through only a fixed
number of hyperparameter settings.
• It moves within the grid in a random fashion
to find the best set of hyperparameters.
• This approach reduces unnecessary
computation.
Implementation of Common
Hyperparameters
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split,
GridSearchCV
from sklearn.tree import DecisionTreeClassifier