Weight Initialization Techniques for Deep Neural Networks
Last Updated :
23 Jul, 2025
While building and training neural networks, it is crucial to initialize the weights appropriately to ensure a model with high accuracy. If the weights are not correctly initialized, it may give rise to the Vanishing Gradient problem or the Exploding Gradient problem. Hence, selecting an appropriate weight initialization strategy is critical when training DL models. In this article, we will learn some of the most common weight initialization techniques, along with their implementation in Python using Keras in TensorFlow.
As pre-requisites, the readers of this article are expected to have a basic knowledge of weights, biases and activation functions. In order to understand what this all, you are and what role they play in Deep Neural Networks - you are advised to read through the article Deep Neural Network With L – Layers
Terminology or Notations
Following notations must be kept in mind while understanding the Weight Initialization Techniques. These notations may vary at different publications. However, the ones used here are the most common, usually found in research papers.
fan_in = Number of input paths towards the neuron
fan_out = Number of output paths towards the neuron
Example: Consider the following neuron as a part of a Deep Neural Network.

For the above neuron,
fan_in = 3 (Number of input paths towards the neuron)
fan_out = 2 (Number of output paths towards the neuron)
Weight Initialization Techniques
1. Zero Initialization
As the name suggests, all the weights are assigned zero as the initial value is zero initialization. This kind of initialization is highly ineffective as neurons learn the same feature during each iteration. Rather, during any kind of constant initialization, the same issue happens to occur. Thus, constant initializations are not preferred.
Zero initialization can be implemented in Keras layers in Python as follows:
Python3
# Zero Initialization
from tensorflow.keras import layers
from tensorflow.keras import initializers
initializer = tf.keras.initializers.Zeros()
layer = tf.keras.layers.Dense(
3, kernel_initializer=initializer)
2. Random Initialization
In an attempt to overcome the shortcomings of Zero or Constant Initialization, random initialization assigns random values except for zeros as weights to neuron paths. However, assigning values randomly to the weights, problems such as Overfitting, Vanishing Gradient Problem, Exploding Gradient Problem might occur.
Random Initialization can be of two kinds:
- Random Normal
- Random Uniform
a) Random Normal: The weights are initialized from values in a normal distribution.
w_i \sim N(0,1)
Random Normal initialization can be implemented in Keras layers in Python as follows:
Python3
# Random Normal Distribution
from tensorflow.keras import layers
from tensorflow.keras import initializers
initializer = tf.keras.initializers.RandomNormal(
mean=0., stddev=1.)
layer = tf.keras.layers.Dense(3, kernel_initializer=initializer)
b) Random Uniform: The weights are initialized from values in a uniform distribution.
w_i \sim N(0,1)
Random Uniform initialization can be implemented in Keras layers in Python as follows:
Python3
# Random Uniform Initialization
from tensorflow.keras import layers
from tensorflow.keras import initializers
initializer = tf.keras.initializers.RandomUniform(
minval=0.,maxval=1.)
layer = tf.keras.layers.Dense(3, kernel_initializer=initializer)
3. Xavier/Glorot Initialization
In Xavier/Glorot weight initialization, the weights are assigned from values of a uniform distribution as follows:
w_i \sim U\space[ -\sqrt{ \frac{\sigma }{fan\_in + fan\_out}}, \sqrt{ \frac{\sigma }{fan\_in + fan\_out}}]
Xavier/Glorot Initialization often termed as Xavier Uniform Initialization, is suitable for layers where the activation function used is Sigmoid. Xavier/Gorat initialization can be implemented in Keras layers in Python as follows:
Python3
# Xavier/Glorot Uniform Initialization
from tensorflow.keras import layers
from tensorflow.keras import initializers
initializer = tf.keras.initializers.GlorotUniform()
layer = tf.keras.layers.Dense(3, kernel_initializer=initializer)
4. Normalized Xavier/Glorot Initialization
In Normalized Xavier/Glorot weight initialization, the weights are assigned from values of a normal distribution as follows:
w_i \sim N(0, \sigma )
Here, \sigma is given by:
\sigma =\space\sqrt{ \frac{6 }{fan\_in + fan\_out}}
Xavier/Glorot Initialization, too, is suitable for layers where the activation function used is Sigmoid. Normalized Xavier/Gorat initialization can be implemented in Keras layers in Python as follows:
Python3
# Normailzed Xavier/Glorot Uniform Initialization
from tensorflow.keras import layers
from tensorflow.keras import initializers
initializer = tf.keras.initializers.GlorotNormal()
layer = tf.keras.layers.Dense(3, kernel_initializer=initializer)
5. He Uniform Initialization
In He Uniform weight initialization, the weights are assigned from values of a uniform distribution as follows:
w_i \sim U\space[ -\sqrt{ \frac{6 }{fan\_in}}, \sqrt{ \frac{6}{fan\_out}}]
He Uniform Initialization is suitable for layers where ReLU activation function is used. He Uniform Initialization can be implemented in Keras layers in Python as follows:
Python3
# He Uniform Initialization
from tensorflow.keras import layers
from tensorflow.keras import initializers
initializer = tf.keras.initializers.HeUniform()
layer = tf.keras.layers.Dense(3, kernel_initializer=initializer)
6. He Normal Initialization
In He Normal weight initialization, the weights are assigned from values of a normal distribution as follows:
w_i \sim N\space[0, \sigma ]
Here, \sigma is given by:
\sigma = \sqrt{ \frac{2}{fan\_in} }
He Uniform Initialization, too, is suitable for layers where ReLU activation function is used. He Uniform Initialization can be implemented in Keras layers in Python as follows:
Python3
# He Normal Initialization
from tensorflow.keras import layers
from tensorflow.keras import initializers
initializer = tf.keras.initializers.HeNormal()
layer = tf.keras.layers.Dense(3, kernel_initializer=initializer)
Conclusion:
Weight Initialization is a very imperative concept in Deep Neural Networks and using the right Initialization technique can heavily affect the accuracy of the Deep Learning Model. Thus, an appropriate weight initialization technique must be employed, taking various factors such as activation function used, into consideration.
Similar Reads
Machine Learning Tutorial Machine learning is a branch of Artificial Intelligence that focuses on developing models and algorithms that let computers learn from data without being explicitly programmed for every task. In simple words, ML teaches the systems to think and understand like humans by learning from the data.Do you
5 min read
Introduction to Machine Learning
Python for Machine Learning
Machine Learning with Python TutorialPython language is widely used in Machine Learning because it provides libraries like NumPy, Pandas, Scikit-learn, TensorFlow, and Keras. These libraries offer tools and functions essential for data manipulation, analysis, and building machine learning models. It is well-known for its readability an
5 min read
Pandas TutorialPandas is an open-source software library designed for data manipulation and analysis. It provides data structures like series and DataFrames to easily clean, transform and analyze large datasets and integrates with other Python libraries, such as NumPy and Matplotlib. It offers functions for data t
6 min read
NumPy Tutorial - Python LibraryNumPy (short for Numerical Python ) is one of the most fundamental libraries in Python for scientific computing. It provides support for large, multi-dimensional arrays and matrices along with a collection of mathematical functions to operate on arrays.At its core it introduces the ndarray (n-dimens
3 min read
Scikit Learn TutorialScikit-learn (also known as sklearn) is a widely-used open-source Python library for machine learning. It builds on other scientific libraries like NumPy, SciPy and Matplotlib to provide efficient tools for predictive data analysis and data mining.It offers a consistent and simple interface for a ra
3 min read
ML | Data Preprocessing in PythonData preprocessing is a important step in the data science transforming raw data into a clean structured format for analysis. It involves tasks like handling missing values, normalizing data and encoding variables. Mastering preprocessing in Python ensures reliable insights for accurate predictions
6 min read
EDA - Exploratory Data Analysis in PythonExploratory Data Analysis (EDA) is a important step in data analysis which focuses on understanding patterns, trends and relationships through statistical tools and visualizations. Python offers various libraries like pandas, numPy, matplotlib, seaborn and plotly which enables effective exploration
6 min read
Feature Engineering
Supervised Learning
Unsupervised Learning
Model Evaluation and Tuning
Advance Machine Learning Technique
Machine Learning Practice