Hyper Parameters

The document discusses hyperparameters in machine learning, which are crucial settings set before training that significantly impact model performance. It covers various types of hyperparameters, including learning rate, number of epochs, batch size, and regularization techniques, as well as methods for hyperparameter tuning like GridSearchCV and RandomizedSearchCV. An implementation example using the Iris dataset and a DecisionTreeClassifier is provided to illustrate the process of hyperparameter tuning.

Uploaded by

bca2m2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Hyper Parameters

Uploaded by

bca2m2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

Hyperparameters

Introduction
• Hyperparameters are essential configuration
settings that are not learned from the data but are
set prior to training a machine learning model.
• They play a critical role in the model's performance,
and selecting appropriate hyperparameters is often
a key part of the machine learning workflow.
• Various hyperparameters used in machine learning
and deep learning models are described next.
Hyperparameters
• Learning Rate (α): This hyperparameter
controls the step size during the optimization
process (e.g., gradient descent).
• A high learning rate can lead to faster
convergence but may result in overshooting
the optimal solution, while a low learning rate
might converge too slowly or get stuck in local
minima.
Hyperparameters
• Number of Epochs: An epoch is one complete
pass through the training dataset.
– The number of epochs determines how many
times the model will see the entire dataset during
training.
– It's a trade-off between ensuring the model learns
well and avoiding overfitting.
Hyperparameters
• Batch Size: During training, data is divided into
batches, and each batch is used to update the
model's parameters.
• Batch size affects training speed, memory
usage, and model convergence.
• Smaller batches provide noisier updates but
might help the model escape local minima.
Hyperparameters
• Number of Layers: In deep learning models,
the number of layers in the neural network
architecture is a hyperparameter.
• Deeper networks can capture complex
features but may suffer from vanishing
gradients or overfitting.
Hyperparameters
• Number of Neurons (Units): The number of
neurons or units in each layer is another
crucial hyperparameter.
• It influences the model's capacity to capture
information and can be determined using
techniques like cross-validation.
Hyperparameters
• Activation Functions: Choosing the
appropriate activation functions for each layer,
such as ReLU, sigmoid, or tanh, is a
hyperparameter.
• Different activation functions may be more
suitable for specific tasks or architectures.
Hyperparameters
• Regularization:
– Hyperparameters related to regularization
techniques, such as L1 and L2 regularization
strength or dropout probability, can help control
overfitting.
Hyperparameters
• Weight Initialization:
– The initial values of the model's weights are a
hyperparameter.
– Proper initialization can affect the training process
and model convergence.
Hyperparameters
• Optimizer: The choice of optimization
algorithm, like stochastic gradient descent
(SGD), Adam, or RMSprop, is a hyperparameter.
– Each optimizer has its strengths and weaknesses.
• Loss Function: The loss function, used to
measure the model's performance during
training, is a hyperparameter.
– It depends on the task (e.g., mean squared error for
regression, cross-entropy for classification).
Hyperparameters
• Learning Rate Schedule: Some models benefit
from a learning rate schedule that changes the
learning rate during training.
– Learning rate annealing or decay can be used to
fine-tune hyperparameters.
• Dropout Rate: The dropout rate in dropout
layers helps regularize neural networks.
– It's the probability of dropping out a neuron
during training.
Hyperparameters
• Early Stopping: The number of epochs to wait
without improvement in the validation loss
before stopping training is another
hyperparameter used for preventing
overfitting.
• Batch Normalization: Hyperparameters
related to batch normalization, such as
momentum and epsilon, can affect the
model's training.
Hyperparameters
• Weight Decay: This is a regularization
hyperparameter that controls the L2 penalty
applied to the model's weights during
optimization.
• Input Features: For feature engineering, you
may need to decide which features to include
or exclude from your dataset.
Hyperparameters
• Architecture Hyperparameters: In convolutional
neural networks (CNNs) and recurrent neural
networks (RNNs), architecture-specific
hyperparameters like kernel size, stride, and the
number of LSTM or GRU units need to be set.
• Ensemble Techniques: Hyperparameters for
ensemble methods, like the number of base models
and their types, must be selected.
• Random Seed: Setting a random seed ensures
reproducibility in machine learning experiments.
Hyperparameters
• Selecting the right hyperparameters can
significantly impact the model's performance and
training stability.
• Techniques like grid search, random search, and
Bayesian optimization are used to explore
hyperparameter spaces and find the best settings
for a given task.
• It often involves a trade-off between underfitting
and overfitting, making hyperparameter tuning an
iterative process in machine learning.
GridSearchCV
• In GridSearchCV approach, the machine learning model is
evaluated for a range of hyperparameter values.
• This approach is called GridSearchCV, because it searches
for the best set of hyperparameters from a grid of
hyperparameters values.
• For example, if we want to set two hyperparameters C and
Alpha of the Logistic Regression Classifier model, with
different sets of values.
• The grid search technique will construct many versions of
the model with all possible combinations of
hyperparameters and will return the best one.
GridSearchCV
• Drawback: GridSearchCV will go through all
the intermediate combinations of
hyperparameters which makes grid search
computationally very expensive.
RandomizedSearchCV
• RandomizedSearchCV solves the drawbacks of
GridSearchCV, as it goes through only a fixed
number of hyperparameter settings.
• It moves within the grid in a random fashion
to find the best set of hyperparameters.
• This approach reduces unnecessary
computation.
Implementation of Common
Hyperparameters
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split,
GridSearchCV
from sklearn.tree import DecisionTreeClassifier

# Load the Iris dataset

iris = load_iris()
X = iris.data
y = iris.target
# Split the dataset into a training set and a testing set
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

# Create a decision tree classifier

clf = DecisionTreeClassifier()

# Define a parameter grid for hyperparameter tuning

param_grid = {
'criterion': ['gini', 'entropy'],
'max_depth': [None, 10, 20, 30],
'min_samples_split': [2, 5, 10],
'min_samples_leaf': [1, 2, 4],
}
# Use GridSearchCV for hyperparameter tuning
grid_search = GridSearchCV(estimator=clf,
param_grid=param_grid, cv=5)
grid_search.fit(X_train, y_train)

# Print the best hyperparameters found by GridSearchCV

print("Best Hyperparameters:")
print(grid_search.best_params_)

# Train a decision tree classifier with the best hyperparameters

best_clf = DecisionTreeClassifier(**grid_search.best_params_)
best_clf.fit(X_train, y_train)
# Evaluate the model on the test set
accuracy = best_clf.score(X_test, y_test)
print("Test Set Accuracy:", accuracy)
Implementation of Common
Hyperparameters
• In this program:
– We load the Iris dataset and split it into a training set and a testing set.
– We create a DecisionTreeClassifier.
– We define a parameter grid param_grid with different values for
hyperparameters such as criterion, max_depth, min_samples_split,
and min_samples_leaf.
– We use GridSearchCV to search for the best hyperparameters within
the specified grid. It performs cross-validation and finds the
combination of hyperparameters that results in the best model
performance.
– We print the best hyperparameters found by GridSearchCV.
– We train a decision tree classifier with the best hyperparameters.
– Finally, we evaluate the model's accuracy on the test set.

A Novel Architecture For Web-Based Attack Detection Using Convolutional Neural Network
No ratings yet
A Novel Architecture For Web-Based Attack Detection Using Convolutional Neural Network
12 pages
Hyper Parameters
No ratings yet
Hyper Parameters
7 pages
Module2.3 Hyperparameter Optimization
No ratings yet
Module2.3 Hyperparameter Optimization
29 pages
B210317003 - Zeeshan Asghar - Assignment No 02
No ratings yet
B210317003 - Zeeshan Asghar - Assignment No 02
6 pages
Hyperparameter tuning
No ratings yet
Hyperparameter tuning
4 pages
Unit 4 A
No ratings yet
Unit 4 A
16 pages
804-Article Text-1282-1-10-20221223
No ratings yet
804-Article Text-1282-1-10-20221223
7 pages
ML Individual assigenment 1 - Copy (2)
No ratings yet
ML Individual assigenment 1 - Copy (2)
11 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
4 pages
AllANN
No ratings yet
AllANN
2 pages
Lecture 5 - Feature extraction, model building & evaluation
No ratings yet
Lecture 5 - Feature extraction, model building & evaluation
35 pages
Tadlo mcl
No ratings yet
Tadlo mcl
11 pages
RO47002 - Lecture 2C - Hyperparameters and Cross-Validation
No ratings yet
RO47002 - Lecture 2C - Hyperparameters and Cross-Validation
10 pages
Lecture_2
No ratings yet
Lecture_2
31 pages
Model Training: (Anything Done While We Train The Model)
No ratings yet
Model Training: (Anything Done While We Train The Model)
194 pages
HyperParameterTuning
No ratings yet
HyperParameterTuning
4 pages
Semester
No ratings yet
Semester
8 pages
ANDONIE, R. Hyperparameter Optimization in Learning Systems. Journal of Membrane Computing. 2019.
No ratings yet
ANDONIE, R. Hyperparameter Optimization in Learning Systems. Journal of Membrane Computing. 2019.
13 pages
Optimization of Hyper-Parameter For CNN Model Using Genetic Algorithm
No ratings yet
Optimization of Hyper-Parameter For CNN Model Using Genetic Algorithm
6 pages
S-5
No ratings yet
S-5
10 pages
Updated Lecture 12 Zainab
No ratings yet
Updated Lecture 12 Zainab
17 pages
#Machinelearning: Mastering Tuning Hyperparameter
No ratings yet
#Machinelearning: Mastering Tuning Hyperparameter
7 pages
Lecture6c HyperparameterOptimization
No ratings yet
Lecture6c HyperparameterOptimization
19 pages
Hyperparameter Search in Machine Learning: February 2015
No ratings yet
Hyperparameter Search in Machine Learning: February 2015
6 pages
Automl: A Perspective Where Industry Meets Academy
No ratings yet
Automl: A Perspective Where Industry Meets Academy
154 pages
Best Practices
No ratings yet
Best Practices
16 pages
On Hyperparameter Optimization of Machine Learning Algorithms: Theory and Practice
No ratings yet
On Hyperparameter Optimization of Machine Learning Algorithms: Theory and Practice
22 pages
ML Answer Key (M.tech)
No ratings yet
ML Answer Key (M.tech)
31 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
9 pages
Hyperparameter_Tuning_in_Machine_Learning_1706249573
No ratings yet
Hyperparameter_Tuning_in_Machine_Learning_1706249573
9 pages
Grid Random Search
No ratings yet
Grid Random Search
6 pages
TrainingNN 1
No ratings yet
TrainingNN 1
52 pages
Hyperparameter Tuning in DNNs
No ratings yet
Hyperparameter Tuning in DNNs
6 pages
Unit - 2 Deep Learning
No ratings yet
Unit - 2 Deep Learning
26 pages
Futureinternet 15 00332 v2
No ratings yet
Futureinternet 15 00332 v2
29 pages
Xg boosting reference
No ratings yet
Xg boosting reference
6 pages
Lecture 12 - Machine Learning
No ratings yet
Lecture 12 - Machine Learning
18 pages
Data analysis ch1
No ratings yet
Data analysis ch1
13 pages
Hyperparameters and Parameters
No ratings yet
Hyperparameters and Parameters
8 pages
DL Practical 02 Binary Class Classifier Using ANN
No ratings yet
DL Practical 02 Binary Class Classifier Using ANN
5 pages
Hyper Parameter Turning
No ratings yet
Hyper Parameter Turning
4 pages
ML Fundamentals
No ratings yet
ML Fundamentals
15 pages
Hyperparameter Optimization of ML Algorithms
No ratings yet
Hyperparameter Optimization of ML Algorithms
69 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
7 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
The Importance of Hyperparameters in Machine Learning
No ratings yet
The Importance of Hyperparameters in Machine Learning
8 pages
15-Hyperparameter Tuning - Batch Normalization-14!08!2024
No ratings yet
15-Hyperparameter Tuning - Batch Normalization-14!08!2024
4 pages
On Hyperparameter Optimization of Machine Learning Algorithms: Theory and Practice
No ratings yet
On Hyperparameter Optimization of Machine Learning Algorithms: Theory and Practice
69 pages
Kenny-230718-The Ultimate Machine Learning Cheat Sheet
No ratings yet
Kenny-230718-The Ultimate Machine Learning Cheat Sheet
20 pages
Quiz 1 Materials
No ratings yet
Quiz 1 Materials
159 pages
Unit Online 1.4
No ratings yet
Unit Online 1.4
132 pages
UNIT-II
No ratings yet
UNIT-II
83 pages
Skit Learn Cheatsheet
No ratings yet
Skit Learn Cheatsheet
11 pages
Arabic Digit Recognition
No ratings yet
Arabic Digit Recognition
5 pages
Lecture 15 - Recap and Midterm Review
No ratings yet
Lecture 15 - Recap and Midterm Review
37 pages
Designing Machine Learning Workflows in Python Chapter1
No ratings yet
Designing Machine Learning Workflows in Python Chapter1
32 pages
Secrets of Deep Learning 1716536527
No ratings yet
Secrets of Deep Learning 1716536527
12 pages
Practicals 1 to
No ratings yet
Practicals 1 to
5 pages
Presentation on ML - Copy
No ratings yet
Presentation on ML - Copy
469 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
artificial-neural-networks
No ratings yet
artificial-neural-networks
61 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
27 pages
Self Organizing Maps (SOM)
No ratings yet
Self Organizing Maps (SOM)
8 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
Lecture5 Deadlock
No ratings yet
Lecture5 Deadlock
13 pages
Memory Mgmt 1
No ratings yet
Memory Mgmt 1
17 pages
Lecture1 Deadlock
No ratings yet
Lecture1 Deadlock
13 pages
Lecture3 Deadlock
No ratings yet
Lecture3 Deadlock
14 pages
Optimal Hyperparameters For Deep LSTM-Networks For Sequence Labeling Tasks
No ratings yet
Optimal Hyperparameters For Deep LSTM-Networks For Sequence Labeling Tasks
34 pages
Deep Learning
No ratings yet
Deep Learning
7 pages
Large Scale Deep Learning
No ratings yet
Large Scale Deep Learning
170 pages
Gradient Descent Optimization
No ratings yet
Gradient Descent Optimization
27 pages
DL Unit-2
No ratings yet
DL Unit-2
32 pages
Federated Learning
No ratings yet
Federated Learning
50 pages
Deep Learning
No ratings yet
Deep Learning
189 pages
L5 - UCLxDeepMind DL2020
No ratings yet
L5 - UCLxDeepMind DL2020
52 pages
H13-311_V3.5 Dumps - HCIA-AI V3.5
No ratings yet
H13-311_V3.5 Dumps - HCIA-AI V3.5
18 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
Jon Krohn Metis Deep Learning 2017-05-01
No ratings yet
Jon Krohn Metis Deep Learning 2017-05-01
107 pages
Collaborative Filtering Matrix Factorization Approach: Jeff Howbert Introduction To Machine Learning Winter 2012 #
No ratings yet
Collaborative Filtering Matrix Factorization Approach: Jeff Howbert Introduction To Machine Learning Winter 2012 #
30 pages
cs231n Training Neural Networks II
No ratings yet
cs231n Training Neural Networks II
99 pages
Unit 3
No ratings yet
Unit 3
17 pages
Christopher Manning Lecture 5: Language Models and Recurrent Neural Networks (Oh, and Finish Neural Dependency Parsing J)
No ratings yet
Christopher Manning Lecture 5: Language Models and Recurrent Neural Networks (Oh, and Finish Neural Dependency Parsing J)
66 pages
9 AIML Question bank updated 5 units
No ratings yet
9 AIML Question bank updated 5 units
21 pages
Imp Reference 3
No ratings yet
Imp Reference 3
13 pages
LN - Optimization For ML
No ratings yet
LN - Optimization For ML
129 pages
Lesson 4 Gradient Descent
No ratings yet
Lesson 4 Gradient Descent
13 pages
Interview Questions in Neural Network
No ratings yet
Interview Questions in Neural Network
9 pages
Automatic Recognition of Guava Leaf Diseases Using Deep Convolution Neural Network
No ratings yet
Automatic Recognition of Guava Leaf Diseases Using Deep Convolution Neural Network
5 pages
Deep Learning With Keras and Tensorflow
No ratings yet
Deep Learning With Keras and Tensorflow
9 pages
Hybrid Quantum Neural Network Based Indoor User Localization Using Cloud Quantum Computing
No ratings yet
Hybrid Quantum Neural Network Based Indoor User Localization Using Cloud Quantum Computing
8 pages
ml_for_data_science
No ratings yet
ml_for_data_science
76 pages
Gradient Descent and Its Types
No ratings yet
Gradient Descent and Its Types
5 pages
Deep Learning CS60010: Computer Science and Engineering
No ratings yet
Deep Learning CS60010: Computer Science and Engineering
59 pages
Lec05-1-Gradient Descent-Detailed
No ratings yet
Lec05-1-Gradient Descent-Detailed
62 pages
Neural Networks The Adaline: Last Lecture Summary
No ratings yet
Neural Networks The Adaline: Last Lecture Summary
19 pages
Lecture 1 - Introduction To NN - CET
No ratings yet
Lecture 1 - Introduction To NN - CET
53 pages