Batch Normalization in AIML Accelerating Deep Learning (3)
Batch Normalization in AIML Accelerating Deep Learning (3)
AI/ML: Accelerating
Deep Learning
This presentation covers Batch Norm basics, why it’s revolutionary, and
how it speeds training and stabilizes models substantially.
What is Batch
Normalization?
Batch Normalization is a technique that normalizes
layer inputs in a neural network.
Worse generalization
Example: You are playing basketball.
Every time you shoot, the This moving basket is just like
basket moves a little bit!
Sometimes it’s higher, internal covariate shift in a
sometimes it’s lower, sometimes neural network
it’s to the side.
You have to adjust every single
time before you shoot.
Placement
Applied before activation functions.
A Mini Real-World Example :
Data Example Soccer Analogy
Imagine you have 5 numbers coming into a layer: Imagine you’re a coach training a soccer team.
Data 100 102 98 101 99 It’s chaos because everyone is on a different "scale."
To train the team properly, you first make everyone play at similar
Mean = ( 100 + 102 + 98 + 101 + 99 ) / 5 = 100 speeds — not too fast, not too slow — so the coaching becomes
easier.
Variance = Measures how much they spread out (here it's small
because they are close).
Divide by the standard deviation → So the numbers aren't too big or Some players are slow 🐢.
too small
Compute Mean 1
Compute the mean of the inputs in the current batch.
2 Compute Variance
Compute the variance of the inputs in the current batch.
Normalize 3
Normalize the inputs by subtracting the mean and dividing by the square root of the variance
plus a small constant ε.
4 Scale and Shift
Scale and shift the normalized inputs using two learnable parameters γ and β.
How Batch Norm Works in Practice
Insertion Point
After linear layers, before activation.
• This layer performs a mathematical operation on the input data, combining it with weights and biases to produce a new output. Think of it as mixing ingredients in a specific
ratio.
Batch Statistics
Mini-batch stats approximate population during training.
For each mini-batch, it computes the mean and variance of the activations
Inference Use
Use moving averages of mean and variance for predictions.
when you're baking cookies to sell in a store, you want every batch to be consistent, regardless of the room temperature. So, you use the average room temperature you've
recorded over time to adjust your recipe, ensuring every batch turns out the same.
In the same way, during inference (when the trained model is used to make predictions), BatchNorm uses the moving averages of the mean and variance it calculated during
training. This ensures that the data is normalized consistently, leading to stable and reliable predictions.
Framework Support
TensorFlow, PyTorch, Keras support batch norm natively.
Results: Faster
Training, Higher
Accuracy
40% Less Supports Better
Training Time Higher Generalizatio
Learning n
Rates
Less Sensitive
to
Initialization
Benefits of Batch Normalization
Reduces Initialization
Acts as Regularizer
Sensitivity
Decreases overfitting, boosting 4 3 Simplifies weight initialization
robustness.
requirements.
Summary: Batch Norm is Essential