Practical Aspects of Deep Learning PI

The document discusses several key aspects of setting up deep learning models for success: 1) It is important to properly split data into training, validation, and test sets to avoid overfitting and get an accurate evaluation of model performance. 2) Tuning hyperparameters like number of layers, units per layer, learning rate, and activation functions is an iterative process that requires evaluating models on a validation set. 3) Regularization techniques like L1 and L2 normalization can be applied to reduce overfitting by adding a penalty term to the loss function.

Uploaded by

Pedro Casariego Córdoba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views46 pages

Practical Aspects of Deep Learning PI

Uploaded by

Pedro Casariego Córdoba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 46

Practical aspects of Deep

Learning Part I

Arles Rodríguez
[email protected]

Facultad de Ciencias
Departamento de Matemáticas
Universidad Nacional de Colombia
Motivation
• It is impossible to get the best
hyperparameters at the first time 1.idea
for a specific application:
– #layers
3.experiment
– #hidden units by layer 2.code
– learning rates
– Activation functions
• Deep learning is an iterative
process.
Motivation
• NN has been successful on:
NLP, Artificial Vision, Speak
1.idea
recognition and structured data.
• Intuition about an area do not
transfer to other application 3.experiment
areas. 2.code

• Successful also depends on the

amount of data and the
hardware and software
configuration.
Motivation
• Idea: how efficiently we
can go around the 1.idea
iterative process?
– Setting up data well.
3.experiment
– Hyperparameter tunning. 2.code

– Optimize execution and

implementation aspects of
the model.
Setting up data well
• In early machine learning models (pre-deep learning
era):

Data

Training test

70% 30%

• It is generally accepted for 100, 100, or 10000

samples as a good rule of thumb.
Setting up data well
• A good idea can be split the data in three sets (for less
than 10.000 samples):
Data

dev/ hold
Training test
out cv set
60% 20% 20%

Used to train Use dev set to Evaluate the best model

on the model tune parameters, to get unbiased estimate
select features of how the model is
and make doing (generalization)
decisions
Setting up data well
• With the big data era (data in the order of samples or more),
dev and test set have been becoming a much smaller
percentage of total.
• Dev set is useful to estimate which of two different algorithm
choices is better.
Data (100000 samples)

Training dev/cv set test

98% 1% 1%10000
With even more data:
99.9% 0.25% 0.25%
99.9% 0.4% 0.1%
train/test set distribution
• The key difference between training/test
datasets is that tests sets are unseen.
• This is because the training procedure has not
used the test examples.
• Training and dev/test sets must come from the
same distribution.
Is this a good data setting?
• The cat app is segmented in 4 regions based on
the largest markets: US, China, India, Latin
America.
• Is it right to randomly assign two of these
segments to the training/dev sets and the other
two to the test set?
Is this a good data setting?
• Is it right to randomly assign two of these
segments to the training/dev sets and the other
two to the test set?
• Answer: The dev set should reflect the task
you want to improve the most: Do well in the
four regions and not in only two.
• Probably the app works well on the training set
but not in the test set.
Another example
• We trained a model to detect cats in
pictures.
• The data was taken mainly from the
internet and split the data set in 70%/30%
into training and test sets, and the
algorithm worked well.
• The users starts uploading their cat pictures
and the performance is poor, what happen?
• Data must come from the same
distribution!
Bias/variance

𝑥2 𝑥2 𝑥2

𝑥1 𝑥1 𝑥1
High bias Just right High variance
This model underfits data Medium level of complexity This model overfits data

With two features (2D) it is possible to plot data and visualize bias and data.
With high dimensional data it is not possible to plot and visualize decision boundary
Example: cat classification