0% found this document useful (0 votes)

12 views45 pages

L10 Regularization Slides

Lecture 10 of STAT 453 focuses on regularization methods to reduce overfitting in deep learning models. Key techniques discussed include early stopping, L1/L2 regularization, and dropout, along with the importance of improving generalization performance through data collection and augmentation. The lecture also emphasizes the significance of adjusting model capacity and using norm penalties to enhance model robustness.

Uploaded by

27jeremi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views45 pages

L10 Regularization Slides

Uploaded by

27jeremi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

STAT 453: Introduction to Deep Learning and Generative Models

Sebastian Raschka
http://stat.wisc.edu/~sraschka/teaching

Lecture 10

Regularization Methods
with Applications in Python

for Neural Networks

Sebastian Raschka STAT 453: Intro to Deep Learning 1
Goal: Reduce Over tting

usually achieved by reducing model

capacity and/or reduction of the variance of
the predictions (as explained last lecture)

Sebastian Raschka STAT 453: Intro to Deep Learning 2

fi
Regularization

In the context of deep learning, regularization can be

understood as the process of adding information / changing
the objective function to prevent over tting

Sebastian Raschka STAT 453: Intro to Deep Learning 3

fi
Regularization / Regularizing E ects
Goal: reduce over tting
usually achieved by reducing model capacity and/or
reduction of the variance of the predictions (as explained last
lecture)

Common Regularization Techniques for DNNs:

• Early stopping
• L1/L2 regularization (norm penalties)
• Dropout

Sebastian Raschka STAT 453: Intro to Deep Learning 4

fi
ff
Sebastian Raschka STAT 453: Intro to Deep Learning 5
Lecture Overview

1. Improving generalization performance

2. Avoiding over tting with (1) more data and (2)

data augmentation

3. Reducing network capacity & early stopping

4. Adding norm penalties to the loss: L1 & L2

regularization

5. Dropout

Sebastian Raschka STAT 453: Intro to Deep Learning 6

fi
An Overview of Techniques for ...

1. Improving generalization performance

2. Avoiding over tting with (1) more data and (2) data
augmentation
3. Reducing network capacity & early stopping
4. Adding norm penalties to the loss: L1 & L2 regularization
5. Dropout

Sebastian Raschka STAT 453: Intro to Deep Learning 7

fi
Collecting more data
Data augmentation
Label smoothing
Dataset
Semi-supervised
Leveraging unlabeled data
Self-supervised
Meta-learning
Leveraging related data
Transfer learning
Weight initialization strategies
Activation functions
Architecture setup
Residual layers
Knowledge distillation

Improving generalization Input standardization

BatchNorm and variants
Normalization
Weight standardization
Gradient centralization
Adaptive learning rates
Training loop Auxiliary losses
Gradient clipping
L2 (/L1) regularization
Regularization Early stopping
Dropout

Sebastian Raschka STAT 453: Intro to Deep Learning 8

First step to improve performance:
Focusing on the dataset itself

1. Improving generalization performance

2. Avoiding over tting with (1) more data and (2) data
augmentation
3. Reducing network capacity & early stopping
4. Adding norm penalties to the loss: L1 & L2 regularization
5. Dropout

Sebastian Raschka STAT 453: Intro to Deep Learning 9

fi
Hig
(Not
Often, the Best Way to Reduce Over tting is
Collecting More Data
Figure 3: Illustration of bias and variance.

Softmax on MNIST subset (test set size is kept constant)

Figure 4: Learning curves of softmax classifiers fit to MNIST.

the training set is small, the algorithm is more likely

Sebastian Raschka
picking up noise in the training set so that the
STAT 453: Intro to Deep Learning 10
fi
Data Augmentation in PyTorch via
TorchVision

Original

Randomly Augmented

https://github.com/rasbt/stat453-deep-learning-ss21/blob/master/L10/code/data-augmentation.ipynb

Sebastian Raschka STAT 453: Intro to Deep Learning 11

https://github.com/rasbt/stat453-deep-learning-ss21/blob/master/L10/code/data-augmentation.ipynb

Sebastian Raschka STAT 453: Intro to Deep Learning 12

Use (0.5, 0.5, 0.5) for RGB images

https://github.com/rasbt/stat453-deep-learning-ss21/blob/master/L10/code/data-augmentation.ipynb

Sebastian Raschka STAT 453: Intro to Deep Learning 13

Other Ways for Dealing with Over tting
if Collecting More Data is not Feasible

=> Reducing Network's Capacity by Other Means

1. Improving generalization performance

2. Avoiding over tting with (1) more data and (2) data
augmentation
3. Reducing network capacity & early stopping
4. Adding norm penalties to the loss: L1 & L2 regularization
5. Dropout

Sebastian Raschka STAT 453: Intro to Deep Learning 14

fi
fi
Early Stopping
Step 1: Split your dataset into 3 parts (always
recommended)

• use test set only once at the end (for unbiased estimate of
generalization performance)
• use validation accuracy for tuning (always recommended)

Dataset
Training Validation Test
dataset dataset dataset

Sebastian Raschka STAT 453: Intro to Deep Learning 15

Early Stopping
Step 2: Early stopping (not very common anymore)

• reduce over tting by observing the training/validation

accuracy gap during training and then stop at the "right" point

Good early stopping point Training set

Accuracy
Validation set

Epochs
Sebastian Raschka STAT 453: Intro to Deep Learning 16
fi
Other Ways for Dealing with Over tting
if Collecting More Data is not Feasible

Adding a Penalty Against Complexity

1. Improving generalization performance

2. Avoiding over tting with (1) more data and (2) data
augmentation
3. Reducing network capacity & early stopping
4. Adding norm penalties to the loss: L1 & L2 regularization
5. Dropout

Sebastian Raschka STAT 453: Intro to Deep Learning 17

fi
fi
L1/L2 Regularization
As I am sure you already know it from various statistics
classes, we will keep it short:

• L1-regularization => LASSO regression

• L2-regularization => Ridge regression (Thikonov regularization)

Basically, a "weight shrinkage" or a "penalty against

complexity"

Sebastian Raschka STAT 453: Intro to Deep Learning 18

L2 Regularization for Linear Models
(e.g., Logistic Regression)

Xn
1 [i] [i]
Costw,b = L(y , ŷ )
n i=1
<latexit sha1_base64="V59LNcdyxr0eVoJo+5mccNlvb8E=">AAACUHicbVHLatwwFL2e9JG6r2m67EZ0KKRQBjsttJtASDZddDGBzExg7BhZI8+IyLKRrpMaoU/MJrt8RzZdtLSaF7RJLwgdnXMuujrKaykMRtFN0Nl68PDR4+0n4dNnz1+87L7aGZmq0YwPWSUrfZpTw6VQfIgCJT+tNadlLvk4Pz9a6OMLro2o1Am2NU9LOlOiEIyip7LuLEH+He1RZdBlNikpzvPCXroPZINz5wjZJ0mhKbOxs8qRxDRlZsV+7M4UISsno9J+c7vtmZ2I1LeHyZyibd3q/D7r9qJ+tCxyH8Rr0IN1DbLudTKtWFNyhUxSYyZxVGNqqUbBJHdh0hheU3ZOZ3zioaIlN6ldBuLIO89MSVFpvxSSJft3h6WlMW2Ze+didnNXW5D/0yYNFl9SK1TdIFdsdVHRSIIVWaRLpkJzhrL1gDIt/KyEzalPDv0fhD6E+O6T74PRXj/+2I+OP/UODtdxbMMbeAu7EMNnOICvMIAhMLiCW/gJv4Lr4EfwuxOsrJsdXsM/1Qn/AM9ytVA=</latexit>

Xn X
1
L2-Regularized-Costw,b = L(y [i] , ŷ [i] ) + wj2
n i=1 n j
<latexit sha1_base64="UZp3ipt8/eQftFzaePzqg+XpUpo=">AAACgHicbVFdb9MwFHXCYKN8rINHXqxVSEWsXVKQhpAmTexlD3sYaN0mNWnkOE7rzXEi+4YRLP8O/hdv/Bgk3DYTbONKls4998P3nptWgmsIgl+e/2Dt4aP1jcedJ0+fPd/sbr0402WtKBvTUpTqIiWaCS7ZGDgIdlEpRopUsPP06nARP//KlOalPIWmYnFBZpLnnBJwVNL9EQH7BuZ4NPjCZrUgin9n2eCw1GATExUE5mluru0OvsGptRjv4yhXhJrQGmlxpOsiMXw/tFOJ8SqTEmGObb+ZmgmPXXknmhMwjV35b/DbtkMk3KwZ+dvnEl8nl9NR0u0Fw2Bp+D4IW9BDrZ0k3Z9RVtK6YBKoIFpPwqCC2BAFnApmO1GtWUXoFZmxiYOSFEzHZimgxa8dk+G8VO5JwEv23wpDCq2bInWZi+X03diC/F9sUkP+ITZcVjUwSVcf5bXAUOLFNXDGFaMgGgcIVdzNiumcOGHA3azjRAjvrnwfnI2G4bth8Pl97+BTK8cGeoW2UR+FaA8doCN0gsaIot9ez9vxBr7v9/1dP1yl+l5b8xLdMv/jHwobwpQ=</latexit>

X
where: wj2 = ||w||22
<latexit sha1_base64="ibct1zBUFvljjClJ/FYMGMyWyc4=">AAACCnicbVDLSsNAFJ34rPUVdelmtAiuShIF3QhFNy4r2Ac0aZhMJ+20kwczE0tJunbjr7hxoYhbv8Cdf+Ok7UJbD1w4nHMv997jxYwKaRjf2tLyyuraemGjuLm1vbOr7+3XRZRwTGo4YhFvekgQRkNSk1Qy0ow5QYHHSMMb3OR+44FwQaPwXo5i4gSoG1KfYiSV5OpHtkgCtw+Hbr9twSuYZXaAZM/z0+E4y1yrbbl6ySgbE8BFYs5ICcxQdfUvuxPhJCChxAwJ0TKNWDop4pJiRsZFOxEkRniAuqSlaIgCIpx08soYniilA/2IqwolnKi/J1IUCDEKPNWZ3ynmvVz8z2sl0r90UhrGiSQhni7yEwZlBPNcYIdygiUbKYIwp+pWiHuIIyxVekUVgjn/8iKpW2XzrGzcnZcq17M4CuAQHINTYIILUAG3oApqAINH8AxewZv2pL1o79rHtHVJm80cgD/QPn8AjqmaLA==</latexit>
j

and λ is a hyperparameter

Sebastian Raschka STAT 453: Intro to Deep Learning 19

Geometric Interpretation of L2 Regularization
1st component:
wj
<latexit sha1_base64="+0kShER0h8B2fD6O/HDwSUbA3/A=">AAAB83icbVDLSgMxFL1TX7W+qi7dBIvgqsyooMuiG5cV7AM6Q8mkmTY2kwlJRilDf8ONC0Xc+jPu/Bsz7Sy09UDgcM693JMTSs60cd1vp7Syura+Ud6sbG3v7O5V9w/aOkkVoS2S8ER1Q6wpZ4K2DDOcdqWiOA457YTjm9zvPFKlWSLuzUTSIMZDwSJGsLGS78fYjMIoe5r2H/rVmlt3Z0DLxCtIDQo0+9Uvf5CQNKbCEI617nmuNEGGlWGE02nFTzWVmIzxkPYsFTimOshmmafoxCoDFCXKPmHQTP29keFY60kc2sk8o170cvE/r5ea6CrImJCpoYLMD0UpRyZBeQFowBQlhk8swUQxmxWREVaYGFtTxZbgLX55mbTP6t553b27qDWuizrKcATHcAoeXEIDbqEJLSAg4Rle4c1JnRfn3fmYj5acYucQ/sD5/AF/55H6</latexit>

minimize cost function

wi
<latexit sha1_base64="mKDmnWfGezBcThHlfbPZ8Pz+m5g=">AAAB83icbVBNS8NAFHypX7V+VT16WSyCp5KooMeiF48VbC00pWy2L+3SzSbsbpQS+je8eFDEq3/Gm//GTZuDtg4sDDPv8WYnSATXxnW/ndLK6tr6RnmzsrW9s7tX3T9o6zhVDFssFrHqBFSj4BJbhhuBnUQhjQKBD8H4JvcfHlFpHst7M0mwF9Gh5CFn1FjJ9yNqRkGYPU37vF+tuXV3BrJMvILUoECzX/3yBzFLI5SGCap113MT08uoMpwJnFb8VGNC2ZgOsWuppBHqXjbLPCUnVhmQMFb2SUNm6u+NjEZaT6LATuYZ9aKXi/953dSEV72MyyQ1KNn8UJgKYmKSF0AGXCEzYmIJZYrbrISNqKLM2JoqtgRv8cvLpH1W987r7t1FrXFd1FGGIziGU/DgEhpwC01oAYMEnuEV3pzUeXHenY/5aMkpdg7hD5zPH35jkfk=</latexit>

Compromise between penalty

2nd component:
and cost
minimize penalty term

Sebastian Raschka, Vahid Mirjalili. Python Machine Learning. 3rd Edition.

Sebastian Raschka STAT 453: Intro to Deep Learning 20

E ect of Norm Penalties on the Decision
Boundary
Assume a nonlinear model

Sebastian Raschka STAT 453: Intro to Deep Learning 21

ff
L2 Regularization for Multilayer Neural Networks

Xn L
X
1 [i] [i] (l) 2
L2-Regularized-Costw,b = L(y , ŷ ) + ||w ||F
n i=1 n
<latexit sha1_base64="RCGZxKRvoEmWPZXkSusdbx4t9Lk=">AAACm3icbVFdb9MwFHXC1ygf6+AFCSFZVEidYFVSJsHL0EQlmFAfBqLbpKaNHMdprTlOZN8AwfWf4qfwxr/BaYMYG1eydHzuPfdeHyel4BqC4JfnX7t+4+atrdudO3fv3d/u7jw40UWlKJvQQhTqLCGaCS7ZBDgIdlYqRvJEsNPkfNTkT78wpXkhP0NdsllOFpJnnBJwVNz9EQH7BmY83PvEFpUgin9n6d6o0GBjE+UElklmvtoX+A9OrMX4AEeZItSE1kiLI13lseEHoZ1LjDeVlAgztv16bqZ85uSdaEnA1HZz38XP2w6RcLum5EIf0fRxWrxa/Z0/N32xa1er+N18GHd7wSBYB74Kwhb0UBvHcfdnlBa0ypkEKojW0zAoYWaIAk4Fs52o0qwk9Jws2NRBSXKmZ2btrcXPHJPirFDuSMBr9qLCkFzrOk9cZbOtvpxryP/lphVkr2eGy7ICJulmUFYJDAVuPgqnXDEKonaAUMXdrpguifMM3Hd2nAnh5SdfBSfDQfhyEH7c7x2+be3YQo/RU9RHIXqFDtEROkYTRL1H3hvvvXfkP/FH/gd/vCn1vVbzEP0T/uQ3kk3NUw==</latexit>
l=1

sum over layers

(l) 2
where ||w ||F is the Frobenius norm (squared): <latexit sha1_base64="71TeQNuRgLqGJEbkEvQmFwkRrF4=">AAACAXicbVDLSsNAFJ3UV62vqBvBzWAR6qYkVdBlURCXFewD2jRMppN26OTBzEQpSdz4K25cKOLWv3Dn3zhps9DWAxcO59zLvfc4IaNCGsa3VlhaXlldK66XNja3tnf03b2WCCKOSRMHLOAdBwnCqE+akkpGOiEnyHMYaTvjq8xv3xMuaODfyUlILA8NfepSjKSSbP0gSXoekiPHjR/SflxhJ2mS2Nf9mq2XjaoxBVwkZk7KIEfD1r96gwBHHvElZkiIrmmE0ooRlxQzkpZ6kSAhwmM0JF1FfeQRYcXTD1J4rJQBdAOuypdwqv6eiJEnxMRzVGd2rZj3MvE/rxtJ98KKqR9Gkvh4tsiNGJQBzOKAA8oJlmyiCMKcqlshHiGOsFShlVQI5vzLi6RVq5qnVfP2rFy/zOMogkNwBCrABOegDm5AAzQBBo/gGbyCN+1Je9HetY9Za0HLZ/bBH2ifP8XjlxM=</latexit>

XX (l) 2
(l) 2
||w ||F = (wi,j )
<latexit sha1_base64="wC+HVng1YzDyBZBFO6aEDDWL7aY=">AAACJHicbVBJSwMxGM3Urdat6tFLsAgtSJmpgoIIRUE8VrALdKZDJs20aTMLScZSpvNjvPhXvHhwwYMXf4vpctDWByGP976P5D0nZFRIXf/SUkvLK6tr6fXMxubW9k52d68mgohjUsUBC3jDQYIw6pOqpJKRRsgJ8hxG6k7/euzXHwgXNPDv5TAkloc6PnUpRlJJdvZiNDI9JLuOGw+SVpxnhWQ0sm9aJXgJTRF5Np1ePZgfTG07psewlxRaJTub04v6BHCRGDOSAzNU7Oy72Q5w5BFfYoaEaBp6KK0YcUkxI0nGjAQJEe6jDmkq6iOPCCuehEzgkVLa0A24Or6EE/X3Row8IYaeoybHgcS8Nxb/85qRdM+tmPphJImPpw+5EYMygOPGYJtygiUbKoIwp+qvEHcRR1iqXjOqBGM+8iKplYrGSdG4O82Vr2Z1pMEBOAR5YIAzUAa3oAKqAINH8AxewZv2pL1oH9rndDSlzXb2wR9o3z95QaQC</latexit>
i j

Sebastian Raschka STAT 453: Intro to Deep Learning 22

L2 Regularization for Neural Nets

Regular gradient descent update:

@L
wi,j := wi,j ⌘
<latexit sha1_base64="5ZjwwihY2mDS1cW07vV16R1eYEo=">AAACNHicbVDLSgMxFM34rPVVdekmWAQXWmZUUASh6EbQRQX7gE4pd9JMG5t5kGSUMsxHufFD3IjgQhG3foOZtlRtPRA4nHvPzb3HCTmTyjRfjKnpmdm5+cxCdnFpeWU1t7ZekUEkCC2TgAei5oCknPm0rJjitBYKCp7DadXpnqf16h0VkgX+jeqFtOFB22cuI6C01Mxd3jdjtotvE3xyikd8D9tUAbZdASS2QxCKAce2B6pDgMdXSfKjjlxJM5c3C2YfeJJYQ5JHQ5SauSe7FZDIo74iHKSsW2aoGnE6mHCaZO1I0hBIF9q0rqkPHpWNuH90gre10sJuIPTzFe6rvx0xeFL2PEd3pnvL8Voq/lerR8o9bsTMDyNFfTL4yI04VgFOE8QtJihRvKcJEMH0rph0QCeldM5ZHYI1fvIkqewXrIOCdX2YL54N48igTbSFdpCFjlARXaASKiOCHtAzekPvxqPxanwYn4PWKWPo2UB/YHx9Aw3LqpE=</latexit>
@wi,j

Gradient descent update with L2 regularization:

✓ ◆
@L 2
wi,j := wi,j ⌘ + wi,j
<latexit sha1_base64="2cuezq6jhNya88Eqgh7isNRI5hg=">AAACYnicbVHLSsQwFE3re3yNutTFxUFQ1KFVQREE0Y0LFwqOCtNhuM2kYzRNS5IqQ+lPunPlxg8xHYvvC4HDuffcx0mYCq6N57047sjo2PjE5FRtemZ2br6+sHitk0xR1qKJSNRtiJoJLlnLcCPYbaoYxqFgN+HDaZm/eWRK80RemUHKOjH2JY84RWOpbn3w1M35FtwXcHgEn3gbAmYQIAh5v78OQaSQ5kGKynAUEMRo7iiK/LwovthPdQGblWIHAmFX6WGRy+Kr+7DrRrfe8JreMOAv8CvQIFVcdOvPQS+hWcykoQK1bvteajp5OZ4KVtSCTLMU6QP2WdtCiTHTnXxoUQFrlulBlCj7pIEh+12RY6z1IA5tZXmd/p0ryf9y7cxEB52cyzQzTNKPQVEmwCRQ+g09rhg1YmABUsXtrkDv0Lpj7K/UrAn+75P/guudpr/b9C/3GscnlR2TZJmsknXik31yTM7IBWkRSl6dMWfOmXfe3Jq74C59lLpOpVkiP8JdeQdk5bTc</latexit>
@wi,j n

Sebastian Raschka STAT 453: Intro to Deep Learning 23

L2 Regularization for Neural Nets in
PyTorch

# regularize loss
L2 = 0.
for name, p in model.named_parameters()
if 'weight' in name
L2 = L2 + (p**2).sum(

cost = cost + 2./targets.size(0) * LAMBDA * L

optimizer.zero_grad(
cost.backward()

Sebastian Raschka STAT 453: Intro to Deep Learning 24

L2 Regularization for Logistic Regression

in PyTorch
Automatically:
#########################################################
## Apply L2 regularization
optimizer = torch.optim.SGD(model.parameters(),
lr=0.1,
weight_decay=LAMBDA
#-------------------------------------------------------

for epoch in range(num_epochs)

#### Compute outputs ####

out = model(X_train_tensor

#### Compute gradients ####

cost = F.binary_cross_entropy(out, y_train_tensor
optimizer.zero_grad(
cost.backward()

Sebastian Raschka STAT 453: Intro to Deep Learning 25

Dropout
1. Improving generalization performance
2. Avoiding over tting with (1) more data and (2) data
augmentation
3. Reducing network capacity & early stopping
4. Adding norm penalties to the loss: L1 & L2 regularization
5. Dropout
5.1 The Main Concept Behind Dropout
5.2 Dropout: Co-Adaptation Interpretation
5.3 Dropout: Ensemble Method Interpretation
5.4 Dropout in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 26

fi
Dropout

Original research articles:

Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2012).
Improving neural networks by preventing co-adaptation of feature detectors. arXiv
preprint arXiv:1207.0580.

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014).
Dropout: a simple way to prevent neural networks from overfitting. The Journal of
Machine Learning Research, 15(1), 1929-1958.

Sebastian Raschka STAT 453: Intro to Deep Learning 27

Dropout in a Nutshell: Dropping Nodes

(1)
a1 <latexit sha1_base64="51Rbp1GGPW28qr7Kl7NY0LPiq2o=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquCnosevFYwX5Iu5Zsmm1Dk+ySZIWy9Fd48aCIV3+ON/+N6XYP2vpg4PHeDDPzgpgzbVz32ymsrK6tbxQ3S1vbO7t75f2Dlo4SRWiTRDxSnQBrypmkTcMMp51YUSwCTtvB+Gbmt5+o0iyS92YSU1/goWQhI9hY6QH3vce06p1O++WKW3MzoGXi5aQCORr98ldvEJFEUGkIx1p3PTc2foqVYYTTaamXaBpjMsZD2rVUYkG1n2YHT9GJVQYojJQtaVCm/p5IsdB6IgLbKbAZ6UVvJv7ndRMTXvkpk3FiqCTzRWHCkYnQ7Hs0YIoSwyeWYKKYvRWREVaYGJtRyYbgLb68TFpnNe+85t5dVOrXeRxFOIJjqIIHl1CHW2hAEwgIeIZXeHOU8+K8Ox/z1oKTzxzCHzifP5zWj58=</latexit>

x1 (2)
a1
<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>

<latexit sha1_base64="vfx38n+ae04OFRd5luhElMypRJ0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPue49puXo+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeXI+g</latexit>

(1)
a2
<latexit sha1_base64="UEIEXkJI4Qcu+777LfA5dwpJBR0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeYI+g</latexit>
o
<latexit sha1_base64="zmvhV5w6wvufBjgJnplzs3qmpp8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69BIvgqSQq6LHoxWML9gPaUDbbSbt2sxt2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemHCmjed9O4W19Y3NreJ2aWd3b/+gfHjU0jJVFJtUcqk6IdHImcCmYYZjJ1FI4pBjOxzfzfz2EyrNpHgwkwSDmAwFixglxkoN2S9XvKo3h7tK/JxUIEe9X/7qDSRNYxSGcqJ11/cSE2REGUY5Tku9VGNC6JgMsWupIDHqIJsfOnXPrDJwI6lsCePO1d8TGYm1nsSh7YyJGellbyb+53VTE90EGRNJalDQxaIo5a6R7uxrd8AUUsMnlhCqmL3VpSOiCDU2m5INwV9+eZW0Lqr+ZdVrXFVqt3kcRTiBUzgHH66hBvdQhyZQQHiGV3hzHp0X5935WLQWnHzmGP7A+fwB2T+M9Q==</latexit>

(2)
x2
<latexit sha1_base64="gBTwEt+X3BPX1KgMo6lYVWIC09o=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEN3o2j</latexit>
a2
<latexit sha1_base64="Rx/RXsiT+s/v11w3kFUY/JZyKRU=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOf5o+h</latexit>

(1)
a3 <latexit sha1_base64="F0cJIqijoEg/scv4wVZxoymO2Dc=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGf6o+h</latexit>

Originally, drop probability 0.5

(but 0.2-0.8 also common now)

Sebastian Raschka STAT 453: Intro to Deep Learning 28

Dropout in a Nutshell: Dropping Nodes
How do we drop the nodes practically/e ciently?

Bernoulli Sampling (during training):

• p := drop probability
• v := random sample from uniform distribution in range [0, 1]
• 8i 2 v : vi := 0 if vi < p else 1
<latexit sha1_base64="JuOrLqiuNAIE5LDs5HvARei8z9U=">AAACLnicbZBNSyNBEIZ7/IzxK+rRS2EQPIUZkV0RF0QRPEYwKmRC6OnUaGNPz9BdEzYM+UVe/Cu7B0FFvPoz7IkRXN0XGl6eqqKr3ihT0pLvP3gTk1PTM7OVuer8wuLScm1l9dymuRHYEqlKzWXELSqpsUWSFF5mBnkSKbyIbo7K+kUfjZWpPqNBhp2EX2kZS8HJoW7tOIxTw5UCCaHUECacrqO46A/3oN+Ve7/Ah5DwNxUgYxiWDPYh+2CoLDoadGt1v+GPBN9NMDZ1NlazW/sb9lKRJ6hJKG5tO/Az6hTckBQKh9Uwt5hxccOvsO2s5gnaTjE6dwibjvTA7e2eJhjRzxMFT6wdJJHrLM+xX2sl/F+tnVO82ymkznJCLd4/inMFlEKZHfSkQUFq4AwXRrpdQVxzwwW5hKsuhODryd/N+XYj+NHwT3fqB4fjOCpsnW2wLRawn+yAnbAmazHBbtkf9sievDvv3nv2Xt5bJ7zxzBr7R97rG1YApto=</latexit>

• a := a v (p × 100% of the activations a will be zeroed)

<latexit sha1_base64="szUeBY7jxQv01MfWqzesiW1S2RI=">AAACE3icbVDLSsNAFJ3UV42vqEs3g0UQFyVRQRGEohuXFewDmlAmk0k7dJIJM5NCCf0HN/6KGxeKuHXjzr9x0gaprQcGzpxzL/fe4yeMSmXb30ZpaXllda28bm5sbm3vWLt7TclTgUkDc8ZF20eSMBqThqKKkXYiCIp8Rlr+4Db3W0MiJOXxgxolxItQL6YhxUhpqWuduBFSfT/M0BheXcOZn8sDrsxfZTjuWhW7ak8AF4lTkAooUO9aX27AcRqRWGGGpOw4dqK8DAlFMSNj000lSRAeoB7paBqjiEgvm9w0hkdaCWDIhX6xghN1tiNDkZSjyNeV+YZy3svF/7xOqsJLL6NxkioS4+mgMGVQcZgHBAMqCFZspAnCgupdIe4jgbDSMZo6BGf+5EXSPK06Z1Xn/rxSuyniKIMDcAiOgQMuQA3cgTpoAAwewTN4BW/Gk/FivBsf09KSUfTsgz8wPn8Agrad7w==</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning 29

ffi
Dropout in a Nutshell: Dropping Nodes
How do we drop the nodes practically/e ciently?

Bernoulli Sampling (during training):

• a := a v (p × 100% of the activations a will be zeroed)

Then, after training when making predictions (during "inference")

scale activations via a := a <latexit sha1_base64="VNKN/K5WhOwo17ucpiVRXZLFb2k=">AAACEHicbVDLSgMxFM3UV62vUZdugkWsC8uMCoogFN24rGAf0BlKJpNpQzPJkGSEMvQT3Pgrblwo4talO//GTNtFrR4InJxzL/feEySMKu0431ZhYXFpeaW4Wlpb39jcsrd3mkqkEpMGFkzIdoAUYZSThqaakXYiCYoDRlrB4Cb3Ww9EKir4vR4mxI9Rj9OIYqSN1LUPvRjpfhBlaAQvr+DMD3oiFBpWXHgMk6OuXXaqzhjwL3GnpAymqHftLy8UOI0J15ghpTquk2g/Q1JTzMio5KWKJAgPUI90DOUoJsrPxgeN4IFRQhgJaR7XcKzOdmQoVmoYB6YyX1jNe7n4n9dJdXThZ5QnqSYcTwZFKYNawDwdGFJJsGZDQxCW1OwKcR9JhLXJsGRCcOdP/kuaJ1X3tOrenZVr19M4imAP7IMKcME5qIFbUAcNgMEjeAav4M16sl6sd+tjUlqwpj274Beszx+rYpsK</latexit>

(1 p)

Q for you: Why is this required?

Sebastian Raschka STAT 453: Intro to Deep Learning 30
ffi
Dropout
1. Improving generalization performance
2. Avoiding over tting with (1) more data and (2) data
augmentation
3. Reducing network capacity & early stopping
4. Adding norm penalties to the loss: L1 & L2 regularization
5. Dropout
5.1 The Main Concept Behind Dropout
5.2 Dropout: Co-Adaptation Interpretation
5.3 Dropout: Ensemble Method Interpretation
5.4 Dropout in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 31

fi
Dropout: Co-Adaptation Interpretation
Why does Dropout work well?

• Network will learn not to rely on particular connections too heavily

• Thus, will consider more connections (because it cannot rely on
individual ones)
• The weight values will be more spread-out (may lead to smaller
weights like with L2 norm)
• Side note: You can certainly use di erent dropout probabilities in
di erent layers (assigning them proportional to the number of units in
a layer is not a bad idea, for example)

Sebastian Raschka STAT 453: Intro to Deep Learning 32

ff
ff
Dropout
1. Improving generalization performance
2. Avoiding over tting with (1) more data and (2) data
augmentation
3. Reducing network capacity & early stopping
4. Adding norm penalties to the loss: L1 & L2 regularization
5. Dropout
5.1 The Main Concept Behind Dropout
5.2 Dropout: Co-Adaptation Interpretation
5.3 Dropout: Ensemble Method Interpretation
5.4 Dropout in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 33

fi
Dropout: Ensemble Method Interpretation

• In dropout, we have a "di erent model" for each minibatch

• Via the minibatch iterations, we essentially sample over M=2h

models, where h is the number of hidden units

• Restriction is that we have weight sharing over these models,

which can be seen as a form of regularization

• During "inference" we can then average over all these models

(but this is very expensive)

Sebastian Raschka STAT 453: Intro to Deep Learning 34

ff
Dropout: Ensemble Method Interpretation
• During "inference" we can then average over all these models
(but this is very expensive)
This is basically just averaging log likelihoods
(this is for one particular class):
hY
M i1/M h M
X i
{i} {i}
pEnsemble = p = exp 1/M log(p )
<latexit sha1_base64="Wnj9ZWP25GYQpdL8qo424eQ/P5Y=">AAACKnicbVDLSgMxFM34tr6qLt0Ei+CqzqigG8EHghuhgrVCZzpk0rTGzmRCckcsYb7Hjb/ixoVS3PohZtoufB0IHM45l5t7IhlzDa47cCYmp6ZnZufmSwuLS8sr5dW1G51mirI6TeNU3UZEs5gLVgcOMbuVipEkilkj6p0VfuOBKc1TcQ19yYKEdAXvcErASmH5RIbGB/YI5lxoVozlOT7C/invNrEvVdoOzf2Rl7fMZY5ly/iG+zZR+EHLeDuXeViuuFV3CPyXeGNSQWPUwvKr305pljABNCZaNz1XQmCIAk7t+pKfaSYJ7ZEua1oqSMJ0YIan5njLKm3cSZV9AvBQ/T5hSKJ1P4lsMiFwp397hfif18ygcxgYLmQGTNDRok4WY0hx0Rtuc8UoxH1LCFXc/hXTO6IIBdtuyZbg/T75L7nZrXp7Ve9qv3J8Oq5jDm2gTbSNPHSAjtEFqqE6ougJvaA39O48O6/OwPkYRSec8cw6+gHn8wvYGad5</latexit>
j=1 <latexit sha1_base64="LdkquqSKAYEkurUC2aKnb/I4G0U=">AAACG3icbVBLSwMxGMzWV62vVY9egkWol7pbBb0USr14ESrYB3S3JZtm29hkd0myYln2f3jxr3jxoIgnwYP/xvRx0NaBwGRmPpJvvIhRqSzr28gsLa+srmXXcxubW9s75u5eQ4axwKSOQxaKlockYTQgdUUVI61IEMQ9Rpre8HLsN++JkDQMbtUoIi5H/YD6FCOlpa5ZKkOHPETQqdJ+2z65ho6MeTe5K9tpR19Y2C/AqJM4CXXS9Hiccrtm3ipaE8BFYs9IHsxQ65qfTi/EMSeBwgxJ2batSLkJEopiRtKcE0sSITxEfdLWNECcSDeZ7JbCI630oB8KfQIFJ+rviQRxKUfc00mO1EDOe2PxP68dK//CTWgQxYoEePqQHzOoQjguCvaoIFixkSYIC6r/CvEACYSVrjOnS7DnV14kjVLRPi3aN2f5SnVWRxYcgENQADY4BxVwBWqgDjB4BM/gFbwZT8aL8W58TKMZYzazD/7A+PoBUJSfwg==</latexit>
j=1

(you may know this as the "geometric mean" from other classes)

For multiple classes, we need to normalize so that the probas

sum pEnsemble, j
pEnsemble, j = Pk
to 1: j=1 pEnsemble, j <latexit sha1_base64="LSX3kl+kyySnI5Sb1zBuUWpVUSc=">AAACRnicdVBNSxwxGH5n/ahurW712EvoIvRQlpkq6GVBlIJHC10VdqZDJvuOxs1khuSd0iXMr/Piubf+hF56aClem133UL8eCDw8HyR5skpJS2H4I2gtLC4tv1hZbb9ce7W+0Xm9eWrL2ggciFKV5jzjFpXUOCBJCs8rg7zIFJ5l46Opf/YVjZWl/kyTCpOCX2iZS8HJS2knqVIXE34j91FbnNbes6umYX0W54YL97TduNjWRequ+lHzxY0b9kws7XTDXjgDe0yiOenCHCdp53s8KkVdoCahuLXDKKwocdyQFAqbdlxbrLgY8wsceqp5gTZxsxkatu2VEctL448mNlP/bzheWDspMp8sOF3ah95UfMob1pTvJ07qqibU4u6ivFaMSjbdlI2kQUFq4gkXRvq3MnHJ/Xzkl2/7EaKHX35MTj/0op1e9Gm3e3A4n2MF3sBbeAcR7MEBHMMJDEDANfyE3/AnuAl+BX+D27toK5h3tuAeWvAPy32zug==</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning 35

Dropout: Ensemble Method Interpretation
• During "inference" we can then average over all these models
(but this is very expensive)

• However, using the last model after training and scaling the
predictions by a factor 1-p approximates the geometric mean
and is much cheaper
(actually, it's exactly the geometric mean if we have a linear
model)

Sebastian Raschka STAT 453: Intro to Deep Learning 36

Sebastian Raschka STAT 453: Intro to Deep Learning 37

fi
Inverted Dropout

• Most frameworks implement inverted dropout

• Here, the activation values are scaled by the factor (1-p)
during training instead of scaling the activations during
"inference"
• I believe Google started this trend (because it's
computationally cheaper in the long run if you use your
model a lot after training)
• PyTorch's Dropout implementation is also inverted Dropout

Sebastian Raschka STAT 453: Intro to Deep Learning 38

Dropout in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 39

Dropout in PyTorch
Here, is is very important that you use model.train() and model.eval()!
for epoch in range(NUM_EPOCHS)
model.train(
for batch_idx, (features, targets) in enumerate(train_loader)

features = features.view(-1, 28*28).to(DEVICE

### FORWARD AND BACK PROP

logits = model(features

cost = F.cross_entropy(logits, targets

optimizer.zero_grad(

cost.backward(
minibatch_cost.append(cost
### UPDATE MODEL PARAMETERS
optimizer.step(

model.eval(
with torch.no_grad()
cost = compute_loss(model, train_loader
epoch_cost.append(cost
print('Epoch: %03d/%03d Train Cost: %.4f' %
epoch+1, NUM_EPOCHS, cost)
print('Time elapsed: %.2f min' % ((time.time() - start_time)/60))

Sebastian Raschka STAT 453: Intro to Deep Learning 40

Without dropout:

With 50% dropout:

https://github.com/rasbt/stat453-deep-learning-ss21/blob/master/L10/code/dropout.ipynb

Sebastian Raschka STAT 453: Intro to Deep Learning 41

Dropout: More Practical Tips

• Don't use Dropout if your model does not over t

• However, in that case above, it is then recommended to
increase the capacity to make it over t, and then use dropout
to be able to use a larger capacity model (but make it not
over t)

Sebastian Raschka STAT 453: Intro to Deep Learning 42

fi
fi
fi
DropConnect:
Randomly Dropping Weights

Sebastian Raschka STAT 453: Intro to Deep Learning 43

DropConnect

• Generalization of Dropout
• More "possibilities"
• Less popular & doesn't work so well in practice

Original research article:

Wan, L., Zeiler, M., Zhang, S., Le Cun, Y., & Fergus, R. (2013, February). Regularization
of neural networks using DropConnect. In International conference on machine learning
(pp. 1058-1066).

Sebastian Raschka STAT 453: Intro to Deep Learning 44

Sebastian Raschka STAT 453: Intro to Deep Learning 45

Breakout Play (Trend Following) - Trading Plan - Full (Sample)
91% (11)
Breakout Play (Trend Following) - Trading Plan - Full (Sample)
15 pages
Nikko New Product - Catalogue
No ratings yet
Nikko New Product - Catalogue
32 pages
AP Chemistry Solubility Rules Equations Sheet
100% (1)
AP Chemistry Solubility Rules Equations Sheet
8 pages
07 Regularization
No ratings yet
07 Regularization
51 pages
What Is Regularization.
No ratings yet
What Is Regularization.
10 pages
Lecture 05 - Regularization - 4p
No ratings yet
Lecture 05 - Regularization - 4p
21 pages
Regularization
No ratings yet
Regularization
46 pages
Unit 2.3
No ratings yet
Unit 2.3
43 pages
DL IT324a 3
No ratings yet
DL IT324a 3
13 pages
NNDL Notes
No ratings yet
NNDL Notes
73 pages
UNIT-II Regularization in Deep Learning
No ratings yet
UNIT-II Regularization in Deep Learning
24 pages
5 Regularization
No ratings yet
5 Regularization
79 pages
Regularization in Machine Learning
No ratings yet
Regularization in Machine Learning
17 pages
DL Unit-3
No ratings yet
DL Unit-3
56 pages
CM20315 09 Regularization
No ratings yet
CM20315 09 Regularization
44 pages
4th Unit DL Final Class Notes
No ratings yet
4th Unit DL Final Class Notes
68 pages
Module - 2 Ver 1.4
No ratings yet
Module - 2 Ver 1.4
35 pages
Week 10
No ratings yet
Week 10
69 pages
Deep Learning Basics Lecture 4 Regularization II
No ratings yet
Deep Learning Basics Lecture 4 Regularization II
27 pages
Unit 4
No ratings yet
Unit 4
35 pages
Regularization Slides
No ratings yet
Regularization Slides
50 pages
4 NN Regularization
No ratings yet
4 NN Regularization
13 pages
S10 DNN Regularization Wip
No ratings yet
S10 DNN Regularization Wip
11 pages
UNIT LV
No ratings yet
UNIT LV
8 pages
Lec 05 Regularization
No ratings yet
Lec 05 Regularization
77 pages
Unit - 4 REGULARIZATION FOR DEEP LEARNING
No ratings yet
Unit - 4 REGULARIZATION FOR DEEP LEARNING
56 pages
DL Class3
No ratings yet
DL Class3
28 pages
Mod 4
No ratings yet
Mod 4
65 pages
Regularization
No ratings yet
Regularization
3 pages
L12 Optim Slides
No ratings yet
L12 Optim Slides
55 pages
Regularization in Deep Learning
No ratings yet
Regularization in Deep Learning
49 pages
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
No ratings yet
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
100 pages
Module-4 3
No ratings yet
Module-4 3
20 pages
Regularization and Normalization
No ratings yet
Regularization and Normalization
29 pages
Unit 4
No ratings yet
Unit 4
93 pages
Regularization: Swetha V, Research Scholar
No ratings yet
Regularization: Swetha V, Research Scholar
32 pages
12-Regularization For Deep Learning-17!08!2024
No ratings yet
12-Regularization For Deep Learning-17!08!2024
51 pages
5-Introduction To regularization-03-Aug-2020Material - I - 03-Aug-2020 - Module3 - Regularization
No ratings yet
5-Introduction To regularization-03-Aug-2020Material - I - 03-Aug-2020 - Module3 - Regularization
10 pages
DL Lect 7
No ratings yet
DL Lect 7
15 pages
Overfitting Problem Regularization (Ridge, Lasso, Elastic) Dropout and Early Stopping
No ratings yet
Overfitting Problem Regularization (Ridge, Lasso, Elastic) Dropout and Early Stopping
17 pages
03 Reg Slides
No ratings yet
03 Reg Slides
64 pages
DL Unit 4
No ratings yet
DL Unit 4
15 pages
Deep Feedforward Networks and Regularization: Licheng Zhang
No ratings yet
Deep Feedforward Networks and Regularization: Licheng Zhang
56 pages
Overfitting Underfitting: UNIT 2: Optimization and Regularization in Neural Networks
No ratings yet
Overfitting Underfitting: UNIT 2: Optimization and Regularization in Neural Networks
18 pages
L09 - Regularisation
No ratings yet
L09 - Regularisation
79 pages
Deep Learning Unit2
No ratings yet
Deep Learning Unit2
16 pages
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
No ratings yet
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
75 pages
Deep Learning: Computer Science and Engineering
No ratings yet
Deep Learning: Computer Science and Engineering
18 pages
Regularization For Neural Networks 1718966083
No ratings yet
Regularization For Neural Networks 1718966083
9 pages
Penalizing Gradient Norm For Efficiently Improving Generalization in Deep Learning
No ratings yet
Penalizing Gradient Norm For Efficiently Improving Generalization in Deep Learning
11 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
Deep Learning Module-03 Search Creators
No ratings yet
Deep Learning Module-03 Search Creators
20 pages
465-Lecture 10-11
No ratings yet
465-Lecture 10-11
79 pages
Deep Learning Basics Lecture 3 Regularization I
No ratings yet
Deep Learning Basics Lecture 3 Regularization I
32 pages
Regularization (Mathematics) - Wikipedia
No ratings yet
Regularization (Mathematics) - Wikipedia
13 pages
4 MachineLearningForCV
No ratings yet
4 MachineLearningForCV
73 pages
DL Chpter 3
No ratings yet
DL Chpter 3
8 pages
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
No ratings yet
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
12 pages
Regularization 1704650055
No ratings yet
Regularization 1704650055
32 pages
Regularization in Machine Learning
No ratings yet
Regularization in Machine Learning
3 pages
Lec8 Regularization
No ratings yet
Lec8 Regularization
41 pages
深度学习：核心原理与案例分析: Chinese Edition
From Everand
深度学习：核心原理与案例分析: Chinese Edition
Posts & Telecom Press
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
Business Ethics - Chapter 5
No ratings yet
Business Ethics - Chapter 5
25 pages
Cost Optimization of Reinforced Concrete Rectangular Beams
100% (1)
Cost Optimization of Reinforced Concrete Rectangular Beams
12 pages
Powerplant Exercises
No ratings yet
Powerplant Exercises
3 pages
Max 30
No ratings yet
Max 30
6 pages
Lesson 3 - Week 1
No ratings yet
Lesson 3 - Week 1
28 pages
Lecture 7-2
No ratings yet
Lecture 7-2
37 pages
2022 Bar Examination Questionnaire For Criminal Law
No ratings yet
2022 Bar Examination Questionnaire For Criminal Law
1 page
Family Emergency Plan
No ratings yet
Family Emergency Plan
2 pages
Bamboo Art: Terracotta
No ratings yet
Bamboo Art: Terracotta
2 pages
Pro Proctor User Guide
No ratings yet
Pro Proctor User Guide
24 pages
Timber Formwork Design
No ratings yet
Timber Formwork Design
12 pages
Edible Oil Industry 1 PDF
No ratings yet
Edible Oil Industry 1 PDF
45 pages
Translate Skenario Toefl
No ratings yet
Translate Skenario Toefl
2 pages
Collab Report Merged
No ratings yet
Collab Report Merged
55 pages
JD Science Physic Teacher
No ratings yet
JD Science Physic Teacher
4 pages
RAC QB Final-2023
No ratings yet
RAC QB Final-2023
9 pages
NZ Pa 36 New Zealand Numeracy Stages 1 To 8 Weekly Planning Template English Ver 2
No ratings yet
NZ Pa 36 New Zealand Numeracy Stages 1 To 8 Weekly Planning Template English Ver 2
12 pages
Inputs and Outputs List Page:1/21: Example-9: Sequential Control of Induction Motors
No ratings yet
Inputs and Outputs List Page:1/21: Example-9: Sequential Control of Induction Motors
7 pages
6 Accounting For Merchandising Businesses
100% (1)
6 Accounting For Merchandising Businesses
107 pages
School Plan of Activities Sembreak
No ratings yet
School Plan of Activities Sembreak
2 pages
01.09 - Vocab - Life in The Countryside
No ratings yet
01.09 - Vocab - Life in The Countryside
3 pages
Pumeet
No ratings yet
Pumeet
46 pages
Sampling Procedure APEDA 1721269949
No ratings yet
Sampling Procedure APEDA 1721269949
5 pages
Heuristic Search Strategies
No ratings yet
Heuristic Search Strategies
23 pages
Soil Classification Using Horizontal To Vertical Spectrum Ratio Methods On Scilab in Sendangmulyo, Semarang
No ratings yet
Soil Classification Using Horizontal To Vertical Spectrum Ratio Methods On Scilab in Sendangmulyo, Semarang
8 pages
Project VBA: How and Why It Can Make You A Project Guru!
No ratings yet
Project VBA: How and Why It Can Make You A Project Guru!
14 pages
The Psychology of Academic Achievement
No ratings yet
The Psychology of Academic Achievement
31 pages