0% found this document useful (0 votes)

21 views

Understanding Batch Normalization, Layer Normalization and Group Normalization by Implementing From Scratch - LinkedIn

Batch normalization normalizes the data across the batch dimension by calculating the mean and variance across all instances in the batch. Layer normalization operates over the feature dimension by calculating the mean and variance for each instance separately over all features. Group normalization divides channels into groups and normalizes the features within each group. The document then provides code to implement basic versions of batch normalization, layer normalization, and group normalization from scratch.

Uploaded by

markus.aurelius

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Understanding Batch Normalization, Layer Normalization and Group Normalization by Implementing From Scratch - LinkedIn

Uploaded by

markus.aurelius

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

7/23/23, 11:47 AM (27) Understanding Batch Normalization, Layer Normalization and Group Normalization by implementing from scratch | LinkedIn

2 1 24
Home My Network Jobs Messaging Notifications Me For Business

1. Batch Normalization: This technique, introduced by

Ioffe and Szegedy in 2015, normalizes the data across
the batch dimension (i.e., for each feature, it calculates
the mean and variance across all instances in the
batch). It is widely used in Convolutional Neural
Networks (CNNs) as it can accelerate training and
improve generalization. However, it can cause issues in
certain scenarios, such as small batch sizes or
sequence models, where the batch size changes every
time step.
In the formula for batch normalization, given a batch of
activations for a specific layer, it first calculates the
mean and standard deviation for the batch. Then, it
https://www.linkedin.com/pulse/understanding-batch-normalization-layer-group-implementing-pasha-s/ 1/5
7/23/23, 11:47 AM (27) Understanding Batch Normalization, Layer Normalization and Group Normalization by implementing from scratch | LinkedIn

subtracts the mean and divides by the standard

deviation to normalize the values. An epsilon is added
to the standard deviation for numerical stability.
Following normalization, batch normalization applies a
scale factor "gamma" and shift factor "beta". These two
parameters are learnable and allow the layer to undo
the normalization if it finds it's not useful.
During training, mean and variance are computed on
the fly for each batch. During testing, a running average
of these calculated during training is used.

def batch_norm(x):
mean = x.mean(0, keepdim=True)
var = x.var(0, unbiased=False, keepdim=True)
x_norm = (x - mean) / (var + 1e-5).sqrt()
return x_norm

2. Layer Normalization: Proposed by Ba et al. in 2016, layer

normalization operates over the feature dimension (i.e., it
calculates the mean and variance for each instance
separately, over all the features). Unlike batch normalization,
it doesn't depend on the batch size, so it's often used in
recurrent models where batch normalization performs
poorly.
Layer normalization computes the mean and standard
deviation across each individual observation instead
(over all channels in case of images or all features in
case of an MLP) rather than across the batch. This
makes it batch-size independent and can therefore be
used in models like RNNs or in transformer models.

def layer_norm(x):
mean = x.mean(1, keepdim=True)
var = x.var(1, unbiased=False, keepdim=True)
x_norm = (x - mean) / (var + 1e-5).sqrt()
return x_norm

3. Group Normalization: Proposed by Wu and He in 2018,

group normalization is a middle-ground approach that
divides the channels into smaller groups and normalizes the
features within each group. It is designed to perform
consistently well for both small and large batch sizes.

https://www.linkedin.com/pulse/understanding-batch-normalization-layer-group-implementing-pasha-s/ 2/5
7/23/23, 11:47 AM (27) Understanding Batch Normalization, Layer Normalization and Group Normalization by implementing from scratch | LinkedIn

Group normalization divides channels into groups and

normalizes the features within each group. It's
computationally straightforward and doesn't have any
restrictions regarding batch size. Group normalization
performs particularly well in small batch scenarios
where batch normalization suffers.

def group_norm(x, num_groups):

N, C = x.shape
x = x.view(N, num_groups, -1)
mean = x.mean(-1, keepdim=True)
var = x.var(-1, unbiased=False, keepdim=True)
x_norm = (x - mean) / (var + 1e-5).sqrt()
x_norm = x_norm.view(N, C)
return x_norm

let's implement a basic version of each of these

normalization techniques from scratch. Please keep in mind
that these implementations are intended to be instructive
and might not cover all the edge cases handled by PyTorch's
built-in versions.

import torc
from torch import nn
import torch.nn.functional as F
from functools import partial

def batch_norm(x):
mean = x.mean(0, keepdim=True)
var = x.var(0, unbiased=False, keepdim=True)
x_norm = (x - mean) / (var + 1e-5).sqrt()
return x_norm

def layer_norm(x):
mean = x.mean(1, keepdim=True)
var = x.var(1, unbiased=False, keepdim=True)
x_norm = (x - mean) / (var + 1e-5).sqrt()
return x_norm

def group_norm(x, num_groups):

class MLP(nn.Module):
def __init__(self, input_dim, hidden_dim, output
super().__init__()
self.linear1 = nn.Linear(input_dim, hidden_d
self.norm_func = norm_func
self.linear2 = nn.Linear(hidden_dim, output_

https://www.linkedin.com/pulse/understanding-batch-normalization-layer-group-implementing-pasha-s/ 3/5
7/23/23, 11:47 AM (27) Understanding Batch Normalization, Layer Normalization and Group Normalization by implementing from scratch | LinkedIn
def forward(self, x):
x = self.linear1(x)
x = self.norm_func(x)
x = F.relu(x)
x = self.linear2(x)
return x

# Create a random tensor with size (batch_size, inpu

x = torch.randn(32, 100)

# Create the MLP models with batch norm, layer norm,

model_bn = MLP(100, 64, 10, batch_norm)
model_ln = MLP(100, 64, 10, layer_norm)
model_gn = MLP(100, 64, 10, partial(group_norm, num_

# Pass the input tensor through the models

output_bn = model_bn(x)
output_ln = model_ln(x)
output_gn = model_gn(x)

# Print the outputs

print("Output with batch norm:\n", output_bn)
print("\nOutput with layer norm:\n", output_ln)
print("\nOutput with group norm:\n", output_gn)

Each of these normalization techniques has its strengths

and weaknesses, and the choice between them depends on
the specific problem and model architecture. For instance,
batch normalization might be the first choice for
convolutional networks with large batch sizes, while layer
normalization or group normalization could be more suitable
for recurrent networks or other models with small or variable
batch sizes.
Report this
Published by
Pasha S 2 Follow
Artificial Intelligence | Deep Learning | NLP | Computer Vision | Generative…articles
Published • 1mo
#deeplearning #ai #machinelearning #naturallanguageprocessing
#computervision #artificialintelligence #neuralnetworks #pytorch #tensorflow

Like Comment Share 8

Reactions

0 Comments
Add a comment…

More from Pasha S

Implementing kl divergence
in pytorch
Pasha S on LinkedIn

About Accessibility Talent Solutions Questions? Select Language

Community Guidelines Careers Marketing Solutions Visit our Help Center. English (English)
Privacy & Terms Ad Choices Advertising Manage your account and privacy
Go to your Settings.
Sales Solutions Mobile Small Business
Safety Center Recommendation transparency
Learn more about Recommended
Content.
LinkedIn Corporation © 2023

https://www.linkedin.com/pulse/understanding-batch-normalization-layer-group-implementing-pasha-s/ 5/5

Census Management System - Documentation FINAL
87% (23)
Census Management System - Documentation FINAL
126 pages
Tarns Mission Line Performance (No Load)
75% (4)
Tarns Mission Line Performance (No Load)
6 pages
Deep Learning Interview Questions and Answers
No ratings yet
Deep Learning Interview Questions and Answers
21 pages
2 2 4 A Designtoollogicconverter (Finished)
No ratings yet
2 2 4 A Designtoollogicconverter (Finished)
6 pages
All Cards
No ratings yet
All Cards
104 pages
Normalization Vs Standardization
No ratings yet
Normalization Vs Standardization
2 pages
Unit 1 DLT
No ratings yet
Unit 1 DLT
10 pages
Hierarchical Multi Escala
No ratings yet
Hierarchical Multi Escala
11 pages
Batch Normalization Separate
No ratings yet
Batch Normalization Separate
20 pages
Machine Learning Questions
No ratings yet
Machine Learning Questions
2 pages
Learning to Reweight
No ratings yet
Learning to Reweight
13 pages
Unit 2
No ratings yet
Unit 2
13 pages
all_cards
No ratings yet
all_cards
106 pages
Dl
No ratings yet
Dl
10 pages
Batch Normalization in AIML Accelerating Deep Learning (3)
No ratings yet
Batch Normalization in AIML Accelerating Deep Learning (3)
12 pages
deep_l_ppt[1]
No ratings yet
deep_l_ppt[1]
8 pages
ML - WEEK 04
No ratings yet
ML - WEEK 04
33 pages
ANN Module-III
No ratings yet
ANN Module-III
16 pages
Introduction To Neural Networks
No ratings yet
Introduction To Neural Networks
28 pages
Layer Normalization: Jimmy@psi - Toronto.edu Rkiros@cs - Toronto.edu Hinton@cs - Toronto.edu
No ratings yet
Layer Normalization: Jimmy@psi - Toronto.edu Rkiros@cs - Toronto.edu Hinton@cs - Toronto.edu
14 pages
Unit6 002
No ratings yet
Unit6 002
10 pages
ML Algorithms
No ratings yet
ML Algorithms
4 pages
A Probabilistic Theory of Deep Learning: Unit 2
No ratings yet
A Probabilistic Theory of Deep Learning: Unit 2
17 pages
Knowledge Discovery in Healthcare-1
No ratings yet
Knowledge Discovery in Healthcare-1
35 pages
Secrets of Deep Learning 1716536527
No ratings yet
Secrets of Deep Learning 1716536527
12 pages
Not All Samples Are Created Equal
No ratings yet
Not All Samples Are Created Equal
13 pages
Neural Basis Models For Interpretability: Filip Radenovic Abhimanyu Dubey Dhruv Mahajan
No ratings yet
Neural Basis Models For Interpretability: Filip Radenovic Abhimanyu Dubey Dhruv Mahajan
21 pages
PR Notes
No ratings yet
PR Notes
7 pages
Misc ML Concepts
No ratings yet
Misc ML Concepts
1 page
SNGAN_5th_Module
No ratings yet
SNGAN_5th_Module
12 pages
DSUP_Exp6[1]
No ratings yet
DSUP_Exp6[1]
5 pages
Bio Optimization of Deep Learning Network Architectures 22fguqp5
No ratings yet
Bio Optimization of Deep Learning Network Architectures 22fguqp5
11 pages
Adl Unit 1 2
No ratings yet
Adl Unit 1 2
67 pages
Unleashing-the-Power-of-Convolutional-Neural-Networks
No ratings yet
Unleashing-the-Power-of-Convolutional-Neural-Networks
13 pages
Skin Disease Prediction
No ratings yet
Skin Disease Prediction
11 pages
Sample Term Paper
No ratings yet
Sample Term Paper
7 pages
1702.07463v7
No ratings yet
1702.07463v7
10 pages
Batch Norm Explained Visually - How It Works, and Why Neural Networks Need It - by Ketan Doshi - Towards Data Science
No ratings yet
Batch Norm Explained Visually - How It Works, and Why Neural Networks Need It - by Ketan Doshi - Towards Data Science
20 pages
Diffusion Min-SNR
No ratings yet
Diffusion Min-SNR
18 pages
Deep Learning QP
No ratings yet
Deep Learning QP
4 pages
Machine Learning Midterm
No ratings yet
Machine Learning Midterm
18 pages
Most Influential Data Science Research Papers
No ratings yet
Most Influential Data Science Research Papers
628 pages
Lecture 8
No ratings yet
Lecture 8
11 pages
Unit -4 Artificial Neural Networks
No ratings yet
Unit -4 Artificial Neural Networks
33 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
3 pages
Barlow Twins- Self-Supervised Learning via Redundancy Reduction (2021)
No ratings yet
Barlow Twins- Self-Supervised Learning via Redundancy Reduction (2021)
13 pages
01-Spike sorting based on multi-class support vector machine with superposition resolution
No ratings yet
01-Spike sorting based on multi-class support vector machine with superposition resolution
7 pages
Definition of Terms
No ratings yet
Definition of Terms
3 pages
Unsupervised Learning Notes
No ratings yet
Unsupervised Learning Notes
21 pages
Image Segmentation FINAL PAPER PDF
No ratings yet
Image Segmentation FINAL PAPER PDF
5 pages
Deep Learning 15
No ratings yet
Deep Learning 15
13 pages
Muzero Poster Neurips 2019
No ratings yet
Muzero Poster Neurips 2019
1 page
Sentiment Analysis From Movie Reviews Us
No ratings yet
Sentiment Analysis From Movie Reviews Us
5 pages
1734451533458_2425-CS420-22TT-HW04
No ratings yet
1734451533458_2425-CS420-22TT-HW04
6 pages
TP3 Mi204 Santos Scardellato
No ratings yet
TP3 Mi204 Santos Scardellato
20 pages
Soft Mod 2
No ratings yet
Soft Mod 2
11 pages
aDSA SuperComp4Trng DNN
No ratings yet
aDSA SuperComp4Trng DNN
12 pages
Anomaly Detection Using Ml (1)
No ratings yet
Anomaly Detection Using Ml (1)
30 pages
Iet Cipher ML Bootcamp (Session-1)
No ratings yet
Iet Cipher ML Bootcamp (Session-1)
67 pages
ML prep for samsung
No ratings yet
ML prep for samsung
73 pages
DL Practical 02 Binary Class Classifier Using ANN
No ratings yet
DL Practical 02 Binary Class Classifier Using ANN
5 pages
F16midterm Sols v2
No ratings yet
F16midterm Sols v2
14 pages
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet
CircularScribbleArts Small
No ratings yet
CircularScribbleArts Small
1 page
A Comprehensive Survey On Deep Graph Representation Learning
No ratings yet
A Comprehensive Survey On Deep Graph Representation Learning
85 pages
Vision Transformer Adapter For Dense Predictions
No ratings yet
Vision Transformer Adapter For Dense Predictions
20 pages
Interacting Particle Solutions of Fokker-Planck Equations Through Gradient-Log-Density Estimation
No ratings yet
Interacting Particle Solutions of Fokker-Planck Equations Through Gradient-Log-Density Estimation
34 pages
NeurIPS 2022 Poisson Flow Generative Models Supplemental Conference
No ratings yet
NeurIPS 2022 Poisson Flow Generative Models Supplemental Conference
33 pages
Fokker-Planck - and Langevin Equat
No ratings yet
Fokker-Planck - and Langevin Equat
16 pages
Cascaded Diffusion Models For High Fidelity Image Generation
No ratings yet
Cascaded Diffusion Models For High Fidelity Image Generation
33 pages
PINT-Provably Expressive Temporal Graph Networks
No ratings yet
PINT-Provably Expressive Temporal Graph Networks
32 pages
Diffusion Based Representation Learning
No ratings yet
Diffusion Based Representation Learning
20 pages
Towards Better Dynamic Graph Learning - New Architecture and Unified Library
No ratings yet
Towards Better Dynamic Graph Learning - New Architecture and Unified Library
24 pages
Score-Based Continuous-Time Discrete Diffusion Models
No ratings yet
Score-Based Continuous-Time Discrete Diffusion Models
16 pages
‎‫מזמורי תהלים- תקציר‬‎
No ratings yet
‎‫מזמורי תהלים- תקציר‬‎
3 pages
WebtoonMe - A Data-Centric Approach For Full-Body Portrait Stylization
No ratings yet
WebtoonMe - A Data-Centric Approach For Full-Body Portrait Stylization
8 pages
Stable Diffusion A Tutorial
100% (1)
Stable Diffusion A Tutorial
66 pages
PFGM++ - Unlocking The Potential of Physics-Inspired Generative Models
No ratings yet
PFGM++ - Unlocking The Potential of Physics-Inspired Generative Models
23 pages
6TH Semester MP Lab Manual 2017 - 3 PDF
No ratings yet
6TH Semester MP Lab Manual 2017 - 3 PDF
85 pages
Jeremy Bentham
No ratings yet
Jeremy Bentham
20 pages
Canadian Condominium Institute Professional Services & Business Partner Directory 2018
No ratings yet
Canadian Condominium Institute Professional Services & Business Partner Directory 2018
88 pages
Edt 313 Hook Lesson Plan-Gravity
No ratings yet
Edt 313 Hook Lesson Plan-Gravity
1 page
Mini Tab
No ratings yet
Mini Tab
30 pages
Evidence Based Practices For Young Children With Autism
No ratings yet
Evidence Based Practices For Young Children With Autism
11 pages
Drilling Supervisor Well Leader in Qatar Resume Ossama M Gamal Aly
No ratings yet
Drilling Supervisor Well Leader in Qatar Resume Ossama M Gamal Aly
4 pages
Evaluating A Company's Resources and Competitive Position: Chapter Title
No ratings yet
Evaluating A Company's Resources and Competitive Position: Chapter Title
65 pages
TUV Rheinland Certificate No 968-V1215.00-21 (Neles - Jamesbury Series Actuator)
No ratings yet
TUV Rheinland Certificate No 968-V1215.00-21 (Neles - Jamesbury Series Actuator)
2 pages
Part 1 PDF
100% (1)
Part 1 PDF
573 pages
Nammiaca 000022
No ratings yet
Nammiaca 000022
5,010 pages
Change Password For Weblogic Users in OBIEE 11g
100% (1)
Change Password For Weblogic Users in OBIEE 11g
64 pages
N4 Purposes of Test Items: Kanji
No ratings yet
N4 Purposes of Test Items: Kanji
1 page
Uninstall Oracle 11g r2
No ratings yet
Uninstall Oracle 11g r2
3 pages
Guideline For Registration of Medical Devices in Sri Lanka
100% (1)
Guideline For Registration of Medical Devices in Sri Lanka
14 pages
NEILLH B - The Art of Contrary Thinking 1985
100% (19)
NEILLH B - The Art of Contrary Thinking 1985
214 pages
1ST Year Physics CH No 3
No ratings yet
1ST Year Physics CH No 3
1 page
The Buffer and Backfill Handbook
No ratings yet
The Buffer and Backfill Handbook
262 pages
Approved DreamSpark Implementation Guide
No ratings yet
Approved DreamSpark Implementation Guide
9 pages
The Last Man Maurice Blanchot pdf download
100% (1)
The Last Man Maurice Blanchot pdf download
45 pages
SMIL (Web) Presentation (Old)
No ratings yet
SMIL (Web) Presentation (Old)
63 pages
EarthZyme 2013
No ratings yet
EarthZyme 2013
30 pages
Draft: Lecture Notes On Discrete Mathematics
No ratings yet
Draft: Lecture Notes On Discrete Mathematics
209 pages
Topic: Biodiversity: Pre-Activity
No ratings yet
Topic: Biodiversity: Pre-Activity
3 pages
Totulsiabsoluttotulcartea 1
100% (2)
Totulsiabsoluttotulcartea 1
413 pages
Chapter 3: Verbal Communication Skills
No ratings yet
Chapter 3: Verbal Communication Skills
14 pages
A. Useful Language: Writing Handout 2: Invitation Letter
No ratings yet
A. Useful Language: Writing Handout 2: Invitation Letter
7 pages