0% found this document useful (0 votes)
21 views

Understanding Batch Normalization, Layer Normalization and Group Normalization by Implementing From Scratch - LinkedIn

Batch normalization normalizes the data across the batch dimension by calculating the mean and variance across all instances in the batch. Layer normalization operates over the feature dimension by calculating the mean and variance for each instance separately over all features. Group normalization divides channels into groups and normalizes the features within each group. The document then provides code to implement basic versions of batch normalization, layer normalization, and group normalization from scratch.

Uploaded by

markus.aurelius
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Understanding Batch Normalization, Layer Normalization and Group Normalization by Implementing From Scratch - LinkedIn

Batch normalization normalizes the data across the batch dimension by calculating the mean and variance across all instances in the batch. Layer normalization operates over the feature dimension by calculating the mean and variance for each instance separately over all features. Group normalization divides channels into groups and normalizes the features within each group. The document then provides code to implement basic versions of batch normalization, layer normalization, and group normalization from scratch.

Uploaded by

markus.aurelius
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

7/23/23, 11:47 AM (27) Understanding Batch Normalization, Layer Normalization and Group Normalization by implementing from scratch | LinkedIn

2 1 24
Home My Network Jobs Messaging Notifications Me For Business

Understanding Batch
Normalization, Layer
Normalization and
Group Normalization by
implementing from
scratch
Pasha S
Artificial Intelligence | Deep Learning | NLP | 2 articles Follow
Computer Vision | Generative AI | Speech…
June 1, 2023

1. Batch Normalization: This technique, introduced by


Ioffe and Szegedy in 2015, normalizes the data across
the batch dimension (i.e., for each feature, it calculates
the mean and variance across all instances in the
batch). It is widely used in Convolutional Neural
Networks (CNNs) as it can accelerate training and
improve generalization. However, it can cause issues in
certain scenarios, such as small batch sizes or
sequence models, where the batch size changes every
time step.
In the formula for batch normalization, given a batch of
activations for a specific layer, it first calculates the
mean and standard deviation for the batch. Then, it
https://www.linkedin.com/pulse/understanding-batch-normalization-layer-group-implementing-pasha-s/ 1/5
7/23/23, 11:47 AM (27) Understanding Batch Normalization, Layer Normalization and Group Normalization by implementing from scratch | LinkedIn

subtracts the mean and divides by the standard


deviation to normalize the values. An epsilon is added
to the standard deviation for numerical stability.
Following normalization, batch normalization applies a
scale factor "gamma" and shift factor "beta". These two
parameters are learnable and allow the layer to undo
the normalization if it finds it's not useful.
During training, mean and variance are computed on
the fly for each batch. During testing, a running average
of these calculated during training is used.

def batch_norm(x):
mean = x.mean(0, keepdim=True)
var = x.var(0, unbiased=False, keepdim=True)
x_norm = (x - mean) / (var + 1e-5).sqrt()
return x_norm

2. Layer Normalization: Proposed by Ba et al. in 2016, layer


normalization operates over the feature dimension (i.e., it
calculates the mean and variance for each instance
separately, over all the features). Unlike batch normalization,
it doesn't depend on the batch size, so it's often used in
recurrent models where batch normalization performs
poorly.
Layer normalization computes the mean and standard
deviation across each individual observation instead
(over all channels in case of images or all features in
case of an MLP) rather than across the batch. This
makes it batch-size independent and can therefore be
used in models like RNNs or in transformer models.

def layer_norm(x):
mean = x.mean(1, keepdim=True)
var = x.var(1, unbiased=False, keepdim=True)
x_norm = (x - mean) / (var + 1e-5).sqrt()
return x_norm

3. Group Normalization: Proposed by Wu and He in 2018,


group normalization is a middle-ground approach that
divides the channels into smaller groups and normalizes the
features within each group. It is designed to perform
consistently well for both small and large batch sizes.

https://www.linkedin.com/pulse/understanding-batch-normalization-layer-group-implementing-pasha-s/ 2/5
7/23/23, 11:47 AM (27) Understanding Batch Normalization, Layer Normalization and Group Normalization by implementing from scratch | LinkedIn

Group normalization divides channels into groups and


normalizes the features within each group. It's
computationally straightforward and doesn't have any
restrictions regarding batch size. Group normalization
performs particularly well in small batch scenarios
where batch normalization suffers.

def group_norm(x, num_groups):


N, C = x.shape
x = x.view(N, num_groups, -1)
mean = x.mean(-1, keepdim=True)
var = x.var(-1, unbiased=False, keepdim=True)
x_norm = (x - mean) / (var + 1e-5).sqrt()
x_norm = x_norm.view(N, C)
return x_norm

let's implement a basic version of each of these


normalization techniques from scratch. Please keep in mind
that these implementations are intended to be instructive
and might not cover all the edge cases handled by PyTorch's
built-in versions.

import torc
from torch import nn
import torch.nn.functional as F
from functools import partial

def batch_norm(x):
mean = x.mean(0, keepdim=True)
var = x.var(0, unbiased=False, keepdim=True)
x_norm = (x - mean) / (var + 1e-5).sqrt()
return x_norm

def layer_norm(x):
mean = x.mean(1, keepdim=True)
var = x.var(1, unbiased=False, keepdim=True)
x_norm = (x - mean) / (var + 1e-5).sqrt()
return x_norm

def group_norm(x, num_groups):


N, C = x.shape
x = x.view(N, num_groups, -1)
mean = x.mean(-1, keepdim=True)
var = x.var(-1, unbiased=False, keepdim=True)
x_norm = (x - mean) / (var + 1e-5).sqrt()
x_norm = x_norm.view(N, C)
return x_norm

class MLP(nn.Module):
def __init__(self, input_dim, hidden_dim, output
super().__init__()
self.linear1 = nn.Linear(input_dim, hidden_d
self.norm_func = norm_func
self.linear2 = nn.Linear(hidden_dim, output_

https://www.linkedin.com/pulse/understanding-batch-normalization-layer-group-implementing-pasha-s/ 3/5
7/23/23, 11:47 AM (27) Understanding Batch Normalization, Layer Normalization and Group Normalization by implementing from scratch | LinkedIn
def forward(self, x):
x = self.linear1(x)
x = self.norm_func(x)
x = F.relu(x)
x = self.linear2(x)
return x

# Create a random tensor with size (batch_size, inpu


x = torch.randn(32, 100)

# Create the MLP models with batch norm, layer norm,


model_bn = MLP(100, 64, 10, batch_norm)
model_ln = MLP(100, 64, 10, layer_norm)
model_gn = MLP(100, 64, 10, partial(group_norm, num_

# Pass the input tensor through the models


output_bn = model_bn(x)
output_ln = model_ln(x)
output_gn = model_gn(x)

# Print the outputs


print("Output with batch norm:\n", output_bn)
print("\nOutput with layer norm:\n", output_ln)
print("\nOutput with group norm:\n", output_gn)

Each of these normalization techniques has its strengths


and weaknesses, and the choice between them depends on
the specific problem and model architecture. For instance,
batch normalization might be the first choice for
convolutional networks with large batch sizes, while layer
normalization or group normalization could be more suitable
for recurrent networks or other models with small or variable
batch sizes.
Report this
Published by
Pasha S 2 Follow
Artificial Intelligence | Deep Learning | NLP | Computer Vision | Generative…articles
Published • 1mo
#deeplearning #ai #machinelearning #naturallanguageprocessing
#computervision #artificialintelligence #neuralnetworks #pytorch #tensorflow

Like Comment Share 8

Reactions

0 Comments
Add a comment…

Pasha S
https://www.linkedin.com/pulse/understanding-batch-normalization-layer-group-implementing-pasha-s/ 4/5
7/23/23, 11:47 AM (27) Understanding Batch Normalization, Layer Normalization and Group Normalization by implementing from scratch | LinkedIn
Artificial Intelligence | Deep Learning | NLP | Computer Vision | Generative AI
| Speech Recognition | Text To Speech | Transformers | Diffusion | Machine
Learning
Follow

More from Pasha S

Implementing kl divergence
in pytorch
Pasha S on LinkedIn

About Accessibility Talent Solutions Questions? Select Language


Community Guidelines Careers Marketing Solutions Visit our Help Center. English (English)
Privacy & Terms Ad Choices Advertising Manage your account and privacy
Go to your Settings.
Sales Solutions Mobile Small Business
Safety Center Recommendation transparency
Learn more about Recommended
Content.
LinkedIn Corporation © 2023

https://www.linkedin.com/pulse/understanding-batch-normalization-layer-group-implementing-pasha-s/ 5/5

You might also like