DL Miid1 Mansi
DL Miid1 Mansi
Perceptron
1. Introduction:
○ Introduced by Frank Rosenblatt in 1958.
○ Fundamental building block in neural networks.
2. Architecture:
○ Consists of a single layer of neurons.
○ Primarily used for binary classification tasks.
3. Functionality:
○ Takes a set of inputs.
○ Applies weights to the inputs.
○ Computes a weighted sum of the inputs.
○ Passes the weighted sum through an activation function (commonly a
step function).
4. Mathematical Representation:
Where:
● xi: Inputs.
● wi: Weights.
● b: Bias.
● y: Output.
a. Introduction:
i. Extension of the perceptron with multiple layers of neurons.
ii. Suitable for more complex machine learning tasks.
b. Architecture:
i. Includes an input layer, at least one hidden layer, and an output layer.
ii. Hidden layers enable learning complex relationships.
c. Activation Functions:
i. Uses nonlinear activation functions such as:
1. ReLU (Rectified Linear Unit).
2. Sigmoid.
3. Tanh.
ii. Nonlinearity allows it to classify non-linearly separable data.
d. Capability:
i. Models complex and nonlinear patterns in data.
ii. Suitable for tasks like image recognition, natural language processing,
and more.
e. Key Differences from Perceptron:
i. Can handle non-linearly separable data.
ii. Employs multiple layers instead of a single layer.
iii. Uses nonlinear activation functions, whereas a perceptron uses a
step function.
Convolution Layer
1. Definition:
○ A core component of Convolutional Neural Networks (CNNs).
○ Extracts spatial features from input data, such as edges, textures, and
shapes.
2. Operation:
○ Applies filters (kernels) to the input matrix.
○ Slides the kernel over the input matrix (convolution operation).
○ Computes element-wise multiplications and sums the results.
3. Output:
○ Produces feature maps that highlight specific features detected by
the filters.
4. Purpose:
○ Captures spatial hierarchies and patterns in data.
Pooling Layer
a. Definition:
i. A layer that reduces the spatial dimensions (height and width) of
feature maps.
b. Types of Pooling:
i. Max-Pooling: Selects the maximum value within a region.
ii. Average-Pooling: Calculates the average value within a region.
c. Operation:
i. Divides the feature map into regions (e.g., 2x2 or 3x3).
ii. Applies the pooling function (max or average) to each region.
d. Benefits:
i. Reduces computational complexity by downsampling.
ii. Helps prevent overfitting.
iii. Retains important features while discarding redundant information.
4. Define McCulloch Pitts Neuron.
● Developed in 1943 by Warren McCulloch and Walter Pitts as one of the
earliest models of artificial neurons.
● Accepts binary inputs (000 or 111), where 111 represents an active signal,
and 000 represents no signal.
● Each input is assigned a weight, and the neuron computes the weighted
sum of inputs.
● The weighted sum is compared to a threshold:
1. Outputs 111 if the sum is greater than or equal to the threshold.
2. Outputs 000 if the sum is below the threshold.
● Can perform basic logical operations like AND, OR, and NOT, forming the
foundation for modern neural networks.
● Threshold Mechanism: The weighted sum SSS is compared to a predefined
threshold value (TTT):
■ If S≥T, the neuron outputs 1 (activation).
■ If S<T, the neuron outputs 0 (no activation).
5. Describe the terms Bias and Variance.
Bias: Refers to the error caused by using a simplified model that does not capture the
complexity of the real-world function.
1. Impact:
○ High Bias results in underfitting, where the model fails to learn
patterns in the training data effectively.
○ The model performs poorly on both training and unseen data.
2. Example:
A linear regression model applied to non-linear data may result in high bias.
Variance: Refers to the error introduced due to the model’s excessive sensitivity to
small variations in the training data.
1. Impact:
○ High Variance leads to overfitting, where the model captures noise
in the training data and performs poorly on unseen data.
2. Example:
A highly complex model, like a deep decision tree, may memorize the training
data but fail to generalize.
Bias-Variance Tradeoff:
Backpropagation Algorithm
● Perform the forward and backward passes for the entire training
dataset for several iterations (epochs).
● Monitor the loss at the end of each epoch to check for convergence
(when the loss stabilizes or reaches an acceptable level).
1. Gradient Descent:
○ Backpropagation uses gradient descent to minimize the loss function.
○ Variants of gradient descent include batch gradient descent,
stochastic gradient descent (SGD), and mini-batch gradient
descent.
2. Learning Rate (α):
○ Controls the step size during weight updates.
○ A small learning rate slows down training, while a large one may lead
to overshooting and instability.
3. Chain Rule of Calculus:
○ Backpropagation relies on the chain rule to compute how a change in
weights and biases affects the loss.
○ This allows efficient computation of gradients for networks with many
layers.
4. Convergence:
○ The training process is repeated until the loss converges, indicating
that the model is optimized to its best capacity on the training data.
Advantages of Backpropagation
1. Efficiency:
○ Enables efficient training of deep networks by computing gradients for
all parameters simultaneously.
2. General Applicability:
○ Works with various architectures and loss functions.
3. Foundation of Deep Learning:
○ Essential for training modern deep neural networks.
Limitations of Backpropagation
Real-World Applications
By iteratively reducing the error, backpropagation ensures that neural networks learn
effectively from data, forming the backbone of modern machine learning and AI.
2. What is Gradient Descent? Explain Momentum based Gradient Descent in
detail.(7m)
● Gradient Descent is a fundamental optimization algorithm used in machine
learning and deep learning to minimize the loss function by iteratively
updating the model's parameters.
● The updates guide the parameters (weights and biases) toward the values
that result in the smallest possible error, ensuring the model performs
optimally on the data.
● Steps in Gradient Descent
i. Initialize Parameters: Begin with random values for weights (w) and
biases (b).
ii. Calculate Gradients: Compute the gradient of the loss function (J(θ))
with respect to each parameter (θ), indicating the direction of steepest
ascent.
iii. Update Parameters: Update each parameter (θ) by taking a step in
the direction of the negative gradient (steepest descent):
iv. Iterate: Repeat the process for multiple iterations or until the loss
converges (stabilizes to a minimum value).
Common Hyperparameters
Tuning Techniques
Selecting and tuning hyperparameters is crucial for building an effective and robust model.
b. What are the applications of CNN? (3m)
Convolutional Neural Networks (CNNs) are specialized deep learning architectures designed
to process and analyze data with spatial or temporal dependencies, such as images, videos,
and time series data. By leveraging convolutional layers, CNNs excel in feature extraction,
making them ideal for pattern recognition and tasks requiring a spatial understanding of the
data.
Applications of CNNs
1. Image Classification
2. Object Detection
● Definition: Applying CNNs to interpret medical images like X-rays, MRIs, and
CT scans for diagnostic purposes.
● Examples:
○ Identifying fractures in X-ray images.
○ Detecting cancerous cells in histopathological images.
○ Classifying retinal diseases from fundus photography.
● Real-World Use Cases:
○ Early disease detection in healthcare.
○ Assisting radiologists by highlighting areas of concern.
4. Video Processing
1. Style Transfer: CNNs can apply artistic styles to images or videos (e.g.,
transforming a photo to mimic the style of a painting).
2. Facial Recognition: Widely used in security systems and smartphone
authentication.
3. Self-Driving Cars: CNNs process camera feeds to detect road signs, lane
boundaries, and pedestrians.
4. Speech Recognition: Used for spectrogram analysis in audio and speech
processing tasks.
5. Satellite Image Analysis: Detecting geographical changes, classifying land
use, and monitoring environmental conditions.
6. Generative Applications: CNNs power Generative Adversarial Networks
(GANs) to create realistic images, videos, and animations.
Conclusion
CNNs have transformed how we approach spatial and sequential data analysis. From
medical diagnostics to autonomous vehicles, their versatility and effectiveness have made
them indispensable in modern AI-driven applications. By continuing to innovate, CNNs are
likely to play an even more critical role in solving complex real-world problems.
iii. Scale and Shift: Transform the normalized activations using learnable
parameters γ (scale) and β (shift):