0% found this document useful (0 votes)
2 views

Hyper Parameters

The document discusses hyperparameters in machine learning, which are crucial settings set before training that significantly impact model performance. It covers various types of hyperparameters, including learning rate, number of epochs, batch size, and regularization techniques, as well as methods for hyperparameter tuning like GridSearchCV and RandomizedSearchCV. An implementation example using the Iris dataset and a DecisionTreeClassifier is provided to illustrate the process of hyperparameter tuning.

Uploaded by

bca2m2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Hyper Parameters

The document discusses hyperparameters in machine learning, which are crucial settings set before training that significantly impact model performance. It covers various types of hyperparameters, including learning rate, number of epochs, batch size, and regularization techniques, as well as methods for hyperparameter tuning like GridSearchCV and RandomizedSearchCV. An implementation example using the Iris dataset and a DecisionTreeClassifier is provided to illustrate the process of hyperparameter tuning.

Uploaded by

bca2m2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Hyperparameters

Introduction
• Hyperparameters are essential configuration
settings that are not learned from the data but are
set prior to training a machine learning model.
• They play a critical role in the model's performance,
and selecting appropriate hyperparameters is often
a key part of the machine learning workflow.
• Various hyperparameters used in machine learning
and deep learning models are described next.
Hyperparameters
• Learning Rate (α): This hyperparameter
controls the step size during the optimization
process (e.g., gradient descent).
• A high learning rate can lead to faster
convergence but may result in overshooting
the optimal solution, while a low learning rate
might converge too slowly or get stuck in local
minima.
Hyperparameters
• Number of Epochs: An epoch is one complete
pass through the training dataset.
– The number of epochs determines how many
times the model will see the entire dataset during
training.
– It's a trade-off between ensuring the model learns
well and avoiding overfitting.
Hyperparameters
• Batch Size: During training, data is divided into
batches, and each batch is used to update the
model's parameters.
• Batch size affects training speed, memory
usage, and model convergence.
• Smaller batches provide noisier updates but
might help the model escape local minima.
Hyperparameters
• Number of Layers: In deep learning models,
the number of layers in the neural network
architecture is a hyperparameter.
• Deeper networks can capture complex
features but may suffer from vanishing
gradients or overfitting.
Hyperparameters
• Number of Neurons (Units): The number of
neurons or units in each layer is another
crucial hyperparameter.
• It influences the model's capacity to capture
information and can be determined using
techniques like cross-validation.
Hyperparameters
• Activation Functions: Choosing the
appropriate activation functions for each layer,
such as ReLU, sigmoid, or tanh, is a
hyperparameter.
• Different activation functions may be more
suitable for specific tasks or architectures.
Hyperparameters
• Regularization:
– Hyperparameters related to regularization
techniques, such as L1 and L2 regularization
strength or dropout probability, can help control
overfitting.
Hyperparameters
• Weight Initialization:
– The initial values of the model's weights are a
hyperparameter.
– Proper initialization can affect the training process
and model convergence.
Hyperparameters
• Optimizer: The choice of optimization
algorithm, like stochastic gradient descent
(SGD), Adam, or RMSprop, is a hyperparameter.
– Each optimizer has its strengths and weaknesses.
• Loss Function: The loss function, used to
measure the model's performance during
training, is a hyperparameter.
– It depends on the task (e.g., mean squared error for
regression, cross-entropy for classification).
Hyperparameters
• Learning Rate Schedule: Some models benefit
from a learning rate schedule that changes the
learning rate during training.
– Learning rate annealing or decay can be used to
fine-tune hyperparameters.
• Dropout Rate: The dropout rate in dropout
layers helps regularize neural networks.
– It's the probability of dropping out a neuron
during training.
Hyperparameters
• Early Stopping: The number of epochs to wait
without improvement in the validation loss
before stopping training is another
hyperparameter used for preventing
overfitting.
• Batch Normalization: Hyperparameters
related to batch normalization, such as
momentum and epsilon, can affect the
model's training.
Hyperparameters
• Weight Decay: This is a regularization
hyperparameter that controls the L2 penalty
applied to the model's weights during
optimization.
• Input Features: For feature engineering, you
may need to decide which features to include
or exclude from your dataset.
Hyperparameters
• Architecture Hyperparameters: In convolutional
neural networks (CNNs) and recurrent neural
networks (RNNs), architecture-specific
hyperparameters like kernel size, stride, and the
number of LSTM or GRU units need to be set.
• Ensemble Techniques: Hyperparameters for
ensemble methods, like the number of base models
and their types, must be selected.
• Random Seed: Setting a random seed ensures
reproducibility in machine learning experiments.
Hyperparameters
• Selecting the right hyperparameters can
significantly impact the model's performance and
training stability.
• Techniques like grid search, random search, and
Bayesian optimization are used to explore
hyperparameter spaces and find the best settings
for a given task.
• It often involves a trade-off between underfitting
and overfitting, making hyperparameter tuning an
iterative process in machine learning.
GridSearchCV
• In GridSearchCV approach, the machine learning model is
evaluated for a range of hyperparameter values.
• This approach is called GridSearchCV, because it searches
for the best set of hyperparameters from a grid of
hyperparameters values.
• For example, if we want to set two hyperparameters C and
Alpha of the Logistic Regression Classifier model, with
different sets of values.
• The grid search technique will construct many versions of
the model with all possible combinations of
hyperparameters and will return the best one.
GridSearchCV
• Drawback: GridSearchCV will go through all
the intermediate combinations of
hyperparameters which makes grid search
computationally very expensive.
RandomizedSearchCV
• RandomizedSearchCV solves the drawbacks of
GridSearchCV, as it goes through only a fixed
number of hyperparameter settings.
• It moves within the grid in a random fashion
to find the best set of hyperparameters.
• This approach reduces unnecessary
computation.
Implementation of Common
Hyperparameters
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split,
GridSearchCV
from sklearn.tree import DecisionTreeClassifier

# Load the Iris dataset


iris = load_iris()
X = iris.data
y = iris.target
# Split the dataset into a training set and a testing set
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

# Create a decision tree classifier


clf = DecisionTreeClassifier()

# Define a parameter grid for hyperparameter tuning


param_grid = {
'criterion': ['gini', 'entropy'],
'max_depth': [None, 10, 20, 30],
'min_samples_split': [2, 5, 10],
'min_samples_leaf': [1, 2, 4],
}
# Use GridSearchCV for hyperparameter tuning
grid_search = GridSearchCV(estimator=clf,
param_grid=param_grid, cv=5)
grid_search.fit(X_train, y_train)

# Print the best hyperparameters found by GridSearchCV


print("Best Hyperparameters:")
print(grid_search.best_params_)

# Train a decision tree classifier with the best hyperparameters


best_clf = DecisionTreeClassifier(**grid_search.best_params_)
best_clf.fit(X_train, y_train)
# Evaluate the model on the test set
accuracy = best_clf.score(X_test, y_test)
print("Test Set Accuracy:", accuracy)
Implementation of Common
Hyperparameters
• In this program:
– We load the Iris dataset and split it into a training set and a testing set.
– We create a DecisionTreeClassifier.
– We define a parameter grid param_grid with different values for
hyperparameters such as criterion, max_depth, min_samples_split,
and min_samples_leaf.
– We use GridSearchCV to search for the best hyperparameters within
the specified grid. It performs cross-validation and finds the
combination of hyperparameters that results in the best model
performance.
– We print the best hyperparameters found by GridSearchCV.
– We train a decision tree classifier with the best hyperparameters.
– Finally, we evaluate the model's accuracy on the test set.

You might also like