Unit II
Unit II
(Autonomous)
ML models can perform well with smaller DL models typically require large amounts of
datasets, making them suitable for applications data to generalize effectively.
Data Size
with limited data availability. They thrive when trained on big datasets,
making them suitable for tasks
ML models often train faster than DL models, DL models slower than ML models.
Training Time which can require prolonged training times,
especially on large datasets.
Traditional ML models are often more DL models, particularly DNN, are considered as
Interpretability interpretable because they rely on human- black boxes because understanding their
engineered features and simpler algorithms. decision-making processes can be challenging.
In traditional ML, a significant amount of time DL models can automatically learn features
Feature is spent on feature engineering, which involves from the raw data, reducing the need for
Engineering selecting, transforming, and engineering extensive feature engineering. This is one of the
relevant features from the raw data to improve key advantages of deep learning.
model performance.
Width and Depth are two important architectural aspects of neural networks that affect their
capacity and performance.
The number of neurons (or units) in each layer of a neural network is known as Width.
Increasing the width of a neural network can increase its capacity to learn complex patterns in
the data.
However, a very wide network may also require more training data and computational
resources and may be prone to overfitting.
Source: NPTEL IIT KGP
Width Vs. Depth of Neural Networks
The sigmoid function compresses all inputs to the range The Tanh function compresses all inputs to the
of {0,1}. 𝜕 𝜎 ( 𝑥) range of {-1,1}.
The gradient of the function is =𝜎 ( 𝑥 ) ( 1 −𝜎 ( 𝑥 ) ) The gradient of the function is
𝜕𝑥
Cons Pros:
Saturation Region: It is a zero center
A sigmoid neuron is saturated when=1 or =0. Cons:
From the graph, at saturation it would be 0; Still gradient vanish problem is there.
𝑤=𝑤 −𝜂 𝛻 𝑤 =0 Computationally expensive.
A saturated gradient neuron can causes the gradient to
vanish
Sigmoid are not zero center
Rectified Linear Unit (ReLU)
ReLU Stands for Rectified Linear Unit.
It is a non linear activation function.
Pros:
It doesn’t saturate in the positive region
Computationally Effective.
Much faster than sigmoid/Tanh
The derivative of ReLU is
𝑓 ( 𝑥 ) =𝑚𝑎𝑥 ( 0 , 𝑥 )
Cons:
This causes Dead neuron Problem.
Leaky ReLU & Exponential ReLu
Leaky RELU Parametric RELU Exponential RELU
Expectations:
Sensitive enough to input for accurate
reconstruction.
Insensitive enough that it doesn’t memorize or
overfit the training data
𝐿𝑜𝑠𝑠 function ⇒ 𝐿 ( 𝑋 , ^
𝑋 ) +𝑅𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑒𝑟
h=𝑔 ( 𝑊 𝑋 𝑖 +𝑏 )
^
𝑋 = 𝑓 ( 𝑊 ∗ h+𝑐 )
𝑖
h=𝑔 ( 𝑊 𝑋 𝑖 +𝑏 )
^
𝑋 = 𝑓 ( 𝑊 ∗ h+𝑐 )
𝑖