0% found this document useful (0 votes)
11 views

Lecture6c HyperparameterOptimization

Uploaded by

Kassa Derbie
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Lecture6c HyperparameterOptimization

Uploaded by

Kassa Derbie
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

DEPARTMENT OF APPLIED MATHEMATICS, COMPUTER SCIENCE AND STATISTICS

HYPERPARAMETER
OPTIMIZATION
Big Data Science (Master in Statistical Data Analysis)
PARAMETER OPTIMIZATION
̶ So far, we have talked about parameter optimization:
̶ Our model contains trainable parameters
̶ We define a loss function
̶ An optimization algorithm searches the parameters that
minimize the loss:
‒ Analytic solutions
‒ Newton-Raphson
‒ (Stochastic) gradient descent
‒ ...

2
HYPERPARAMETER OPTIMIZATION
̶ Most models also have hyperparameters:
̶ Fixed before training the model
̶ Involve assumptions of the model
̶ Not taken into account in the gradient of the
optimization function

3
EXAMPLES OF HYPERPARAMETERS
Neural
Linear models Random Forest SVM KNN
networks
• Regularization • Number of • Kernel • Architecture •𝐾
constant trees • Margin • Number of • Distance
• Maximum • Kernel layers metric
depth parameters: • Size of each • Parameters of
• Minimum leaf • Polinomial layer approximate
size degree • Activation structures
• Criterion for • Gaussian function
split kernel width • Dropout
• Number of • ... • Regularization
features per • ...
split
• ...

4
CHOOSING HYPERPARAMETERS
̶ Manual search
̶ Grid search
̶ Random search
̶ Automated methods:
̶ Bayesian optimization
̶ Evolutionary optimization

5
MANUAL TUNING
̶ Using assumptions or knowledge to select the hyperparameters

̶ Pros:
̶ Computationally efficient

̶ Cons:
̶ Requires manual labor
̶ Prone to bias
̶ Limited combinations are tested

6
GRID SEARCH
̶ For each hyperparameter, define a subset of values that will be
tested
̶ Iteratively test all combinations

̶ Pros:
̶ The individual effect of parameters can be studied
̶ Cons:
̶ The number of combinations can become very high
̶ Few values are tested for every parameter
̶ The combined effect of parameters is not completely modeled

7
RANDOM SEARCH
̶ A random distribution is specified for each parameter
̶ Samples are drawn and tested

̶ Pros:
̶ The combined effect of parameters is somewhat modeled
̶ More values per parameter can be considered
̶ Cons:
̶ The search is not guided
̶ The individual effect of parameters is not clear

8
GRID VS RANDOM

J. Bergstra, Y. Bengio, “Random Search for Hyper-


Parameter Optimization”, Journal of Machine Learning
Research 13 (2012) 281-305

9
HYPERPARAMETER
OPTIMIZATION AS AN
OPTIMIZATION PROBLEM
10
AUTOMATED HYPERPARAMETER OPTIMIZATION
̶ Why not solve hyperparameter optimization in the
same way as parameter optimization?

̶ Main approaches:
̶ Bayesian optimization
̶ Evolutionary algorithms

11
SEQUENTIAL MODEL-BASED BAYESIAN OPTIMIZATION (SMBO)
1. Query the function 𝑓 at 𝑡 values and record the
𝑡
resulting pairs S = 𝜽𝑖 , 𝑓(𝜽𝑖 ) 𝑖=1

2. For a fixed number of iterations:


1. Fit a probabilistic model ℳ to the pairs in S
2. Apply an aquisition function 𝑎(𝜽, ℳ) to select a
promising input 𝜽 to evaluate next
3. Evaluate 𝑓(𝜽) and include 𝜽, 𝑓(𝜽) into S

12
GENETIC ALGORITHMS
̶ Applying the principles of natural selection to optimization
̶ Solutions are encoded as "chromosomes"
̶ A crossover operator combines two chromosomes into new ones
̶ A mutation operator introduces random mutations

1. Generate an initial population of solutions


2. For a number of generations:
1. Crossover solutions to increase population size
2. Apply mutation operator
3. Evaluate new solutions
4. Discard some "bad" solutions to maintain a "good" population

13
PARTITIONING
14
PARTITIONING FOR HYPERPARAMETER OPTIMIZATION
̶ Remember: NEVER TRAIN ON THE TEST SET

̶ This is also valid when training hyperparameters

15
TEST SET + CROSS VALIDATION
Valid. Training
Training
Valid.
Training
CV Valid. ...
Training
Training
Training
Valid.

Test

Daniel Peralta <[email protected]> 16


NESTED CROSS VALIDATION

Test Training

Test
Valid. Training
Training
Training

...
Training Valid.

Training
Training

Training Valid.
Test

Daniel Peralta <[email protected]>


17
NESTED CROSS VALIDATION: EXAMPLE
̶ 5 folds
̶ 3 classifiers: Logistic Regression, Random Forest, SVM

̶ We want to know which classifier is better suited to our problem


̶ We also want to optimize the hyperparameters of each classifier
̶ 3 inner folds for hyperparameter optimization
̶ The ultimate goal is to have a system in production doing real
predictions

18
NESTED CROSS VALIDATION: EXAMPLE
1. For each outer fold i in [1...5]:
1. Validation set: Fold i
2. Training set: Folds {1,2,3,4,5}\{i}
3. Split training set into 3 inner folds
4. For each classifier 𝐶 in {LR, RF, SVM}:
1. For each combination of hyperparameters 𝜃𝑐 for 𝐶:
1. For each inner fold j in [1...3]:
1. (Inner) validation set: Fold j
2. (Inner) training set: Folds {1,2,3}\{j}
3. Train classifier 𝐶(𝜃𝑐 ) on training set
4. Evaluate 𝐶(𝜃𝑐 ) on validation set
2. Calculate average performance of 𝐶 𝜃𝑐 across 3 inner folds
∗(𝑖)
2. Select best performing parameters 𝜃𝑐 for classifier 𝐶
∗(𝑖)
3. Evaluate C(𝜃𝑐 ) on validation set
∗(𝑖)
2. Calculate average performance of each C(𝜃𝑐 ) across all validation folds
3. Select the best classifier C ∗
4. Select 𝜃𝑐∗∗ as the optimal hyperparameters for C ∗
5. Train C ∗ 𝜃𝑐∗∗ on the entire dataset

∗(𝑖)
̶ Note that the best parameters 𝜃𝑐 for each classifier depend on the outer fold that was used for training

19

You might also like