0% found this document useful (0 votes)
14 views12 pages

Batch Normalization in AIML Accelerating Deep Learning (3)

Batch Normalization is a technique that normalizes inputs in neural networks, addressing internal covariate shift to speed up training and stabilize models. It allows for higher learning rates and reduces sensitivity to initialization, resulting in faster convergence and better generalization. The method is widely supported in frameworks like TensorFlow and PyTorch, making it essential for modern AI/ML development.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views12 pages

Batch Normalization in AIML Accelerating Deep Learning (3)

Batch Normalization is a technique that normalizes inputs in neural networks, addressing internal covariate shift to speed up training and stabilize models. It allows for higher learning rates and reduces sensitivity to initialization, resulting in faster convergence and better generalization. The method is widely supported in frameworks like TensorFlow and PyTorch, making it essential for modern AI/ML development.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Batch Normalization in

AI/ML: Accelerating
Deep Learning
This presentation covers Batch Norm basics, why it’s revolutionary, and
how it speeds training and stabilizes models substantially.
What is Batch
Normalization?
Batch Normalization is a technique that normalizes
layer inputs in a neural network.

It helps the model learn faster and become more


stable during training.

Imagine making all inputs to a layer have similar scale


and distribution.

This avoids huge jumps or slow progress, making


learning smoother and easier.
The Problem: Internal
Covariate Shift
Definition 1
Internal Covariate Shift
means the distribution of
the inputs to each layer 2 Impact
keeps changing during Slower training
training, because the
Optimization becomes
parameters of the previous
tough
layers are changing.
Layer interference

Worse generalization
Example: You are playing basketball.

Every time you shoot, the This moving basket is just like
basket moves a little bit!
Sometimes it’s higher, internal covariate shift in a
sometimes it’s lower, sometimes neural network
it’s to the side.
You have to adjust every single
time before you shoot.

That’s really hard, right?


It slows you down because you
are always guessing
What is Normalization?
Normalization Standardization
Layer inputs normalized Zero mean and unit
within each mini-batch. variance inputs improve
stability.

Placement
Applied before activation functions.
A Mini Real-World Example :
Data Example Soccer Analogy

Imagine you have 5 numbers coming into a layer: Imagine you’re a coach training a soccer team.

Data 100 102 98 101 99 It’s chaos because everyone is on a different "scale."
To train the team properly, you first make everyone play at similar
Mean = ( 100 + 102 + 98 + 101 + 99 ) / 5 = 100 speeds — not too fast, not too slow — so the coaching becomes
easier.
Variance = Measures how much they spread out (here it's small
because they are close).

Normalize step-by-step: Some players run super fast ⚡️.

Subtract the mean (100) from each value → Center around 0

Divide by the standard deviation → So the numbers aren't too big or Some players are slow 🐢.
too small

Now they might look like:


Some players shoot hard, some soft.
Normalized Data 0.0 0.89 -0.89 0.45 -0.45
What Is an Activation
Function?
In a neural network, each neuron processes input data and decides
whether to "activate" or not. This decision-making process is governed
by an activation function.​

If the input is weak,


If the input is strong the switch stays off
enough, the switch (the neuron doesn't
turns on (the neuron activate).​
activates).
The Math Behind Batch Norm
The normalization helps reduce the internal covariate shift, which can speed up training and improve
the model's performance. The scaling and shifting parameters allow the model to learn an optimal
transformation of the normalized inputs. The small constant ε ensures numerical stability and prevents
division by zero.

Compute Mean 1
Compute the mean of the inputs in the current batch.

2 Compute Variance
Compute the variance of the inputs in the current batch.

Normalize 3
Normalize the inputs by subtracting the mean and dividing by the square root of the variance
plus a small constant ε.
4 Scale and Shift
Scale and shift the normalized inputs using two learnable parameters γ and β.
How Batch Norm Works in Practice

Insertion Point
After linear layers, before activation.

• This layer performs a mathematical operation on the input data, combining it with weights and biases to produce a new output. Think of it as mixing ingredients in a specific
ratio.​

Batch Statistics
Mini-batch stats approximate population during training.

For each mini-batch, it computes the mean and variance of the activations

Mini-Batch Mean (μ₍B₎).​


• Mini-Batch Variance (σ²₍B₎)

Inference Use
Use moving averages of mean and variance for predictions.

when you're baking cookies to sell in a store, you want every batch to be consistent, regardless of the room temperature. So, you use the average room temperature you've
recorded over time to adjust your recipe, ensuring every batch turns out the same.

In the same way, during inference (when the trained model is used to make predictions), BatchNorm uses the moving averages of the mean and variance it calculated during
training. This ensures that the data is normalized consistently, leading to stable and reliable predictions.

Framework Support
TensorFlow, PyTorch, Keras support batch norm natively.
Results: Faster
Training, Higher
Accuracy
40% Less Supports Better
Training Time Higher Generalizatio
Learning n
Rates

Less Sensitive
to
Initialization
Benefits of Batch Normalization

Higher Learning Rates


Accelerates Training
Allows aggressive optimization
2x to 10x faster convergence. 1 2 without divergence.

Reduces Initialization
Acts as Regularizer
Sensitivity
Decreases overfitting, boosting 4 3 Simplifies weight initialization
robustness.
requirements.
Summary: Batch Norm is Essential

Addresses Simple and Improves Modern Deep


Internal Covariate Effective Performance Learning Standard
Shift

Batch normalization remains a cornerstone technique accelerating modern AI/ML development.

You might also like