0% found this document useful (0 votes)
14 views26 pages

Deep Learning

The document provides an overview of artificial intelligence (AI), machine learning (ML), and deep learning (DL), explaining their definitions, differences, and applications. It discusses key concepts such as supervised and unsupervised learning, overfitting and underfitting, bias and variance, and various types of neural networks. Additionally, it covers gradient descent, estimators, and Bayesian statistics, highlighting their significance in model training and evaluation.

Uploaded by

tanyakumari0102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views26 pages

Deep Learning

The document provides an overview of artificial intelligence (AI), machine learning (ML), and deep learning (DL), explaining their definitions, differences, and applications. It discusses key concepts such as supervised and unsupervised learning, overfitting and underfitting, bias and variance, and various types of neural networks. Additionally, it covers gradient descent, estimators, and Bayesian statistics, highlighting their significance in model training and evaluation.

Uploaded by

tanyakumari0102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Deep learning

Ai (artificial intelligence) -it is the combination of two words Artificial + intelligence. Artificial means
manmade and intelligence means ability to take decisions in any circumstance. AI builds applications
which can perform its own tasks without human intervention. examples of Ai including speech
recognition, self-driving car, content recommendation system etc.

ML (machine learning) – machine learning is a subset of Ai that provides us machines with the ability
to automatically learn from data by making patterns and improves with experience without being
explicitly programmed.it uses both supervised learning and unsupervised learning actually ml
provides us statistical tools to explore the data.

Dl (deep learning)- it is the subset of machine learning that is made up of neural networks –
(algorithms inspired by human brain). we refer deep learning because the neural networks have
various deep layers that enable learning – first layer (input layer) data enters through input layer.
Hidden layer – processing and transportation of data to other layers.
Output layer – the final output or prediction is made in the output layer.

It improves as you increase the amount of data being used to train the model.

Generative Ai – generative ai is the subfield of machine learning that generate high quality images,
text and other content based on the data they were trained on without explicitly instructions by
using the mixture of supervised and unsupervised learning.

MACHINE LEARNING BASICS

A subset of Ai known as machine learning, focus primarily on algorithms that enable a computer to
independently learn from data and previous experiences. Machine learning algorithms create a
mathematical model that without being explicitly programmed, aids in making decisions and
predictions with the assistance of samples, historical data or training model.

Features: -

1.Ml uses data to detect various patterns in a given data set.

2.it can learn from past experience and improves automatically.

3.data driven technology.

4.Ml is much similar to data mining as it also deals with huge amount of data.
Importance -

1.rapid increment in the production of data.

2.solving difficult problems which are difficult to solve.

3.decision making in various sectors including finance.

4.finding hidden patterns and extracting useful information from data.

DIFFERENCE BETWEEN SUPERVISED AND UNSUPERVISED LEARNING

SUPERVISED LEARNING UNSUPERVISED LEARNING


1.supervised learning uses labelled data. 1.unsupervised learning uses unlabelled data.
2.it takes direct feedback to check if it is 2.it doesn’t take any feedback.
predicting the right output or not.
3.simpler method. 3.computationally complex.
4.high accuracy. 4.less accurate.
5.number of classes know 5.number of classes don’t k now.
6.desired out is given along with input. 6.no output is given.
7.uses training data. 7.No training data is used
8.we can test our model. 8.we can’t test our model.
9.external supervision is used. 9.no supervision is used.
10.it aims to calculate output. 10.it aims to discover undefined patterns.
11.it can be classified into- 11.it can be classified in-
Classification. Clustering.
Regression. Association problems.

12.algorithms- SVM, classification, decision tree, 12. algorithms- KNN, CLUSTERING etc.
Bayesian network.

DIFFERENCE BETWEEN OVERFITTING AND UNDERFITTING

OVERFITTING UNDERFITTING
1. when your model contains maximum features 1.when your model contains limited features
or parameters, it is called overfitting. or parameters, it is called underfitting.
2. with overfitting model becomes too complex. 2.with underfitting model becomes too simple.
3.it can lead to model being unable to generalize 3.it can lead to model being unable to learn
new data. from the available data.
4.it increases regularization. 4.it decreases regularization.
5.it is more challenging to detect. 5.underfitting is easy to detect as compare to
overfitting.
6.high variance low bias. 6.low variance and high bias.
7.overfitting increases the size of dataset. 7.underfitting decreases the size of dataset.
8.ways to overcome overfitting: 8.ways to reduce underfitting:
Minimize the features or parameters of model. Maximize the features or parameters of
Cross validation. model.
Removing the noise from data.
Bias vs variance

Bias- the gap between predicted value and actual value of data, is called bias.

High Bias- predicted value is more far away than the actual value (gap between predicted and actual
value is high).it is not accurate and lead to underfitting.

Low Bias-predicted value is near to the actual value (gap between predicted and actual value is
low).it is accurate and lead to overfitting.

Ways to reduce high bias

1.use more complex model (polynomial regression, convolutional neural network, recurrent neural
network).

2.increase the number of features.

3.reduce regularization(L1+L2) of the model.

4.increase the size of the training data.

Variance – variance refers predicted values and how much they are scatter in relation to each other.it
is a measure of spread of data from its mean position.

Low variance – group of predicted values doesn’t scatter from each other.

High variance – group of predicted values which scatter from each other. (overfitting)

Ways to reduce variance

1.cross validation

2.feature selection

3.regularization

4.ensemble methods (Bagging and Boosting)

5.early stopping

6.simplyfying the model


High bias, low variance = underfitting

Low bias, high variance = overfitting

High bias, high variance =not able to capture underlying patterns.

Low bias, low variance =model is able to capture underlying patterns in data and not too sensitive
changes in training data.

Machine learning algorithm Bias variance


Linear regression high Low
Decision tree low high
Random forest low high
bagging low highs

Estimation estimator and estimate

Estimation (process)– the process of using sample data to infer the value of a population parameter
(mean, proportion, variance).

Estimator(formula)- an estimator is a formula that takes sample data as input and produces an
estimate as output. examples include the sample mean, sample proportion and sample variance.

Estimate (final numerical value) – an estimate is the result of applying an estimator to the sample of
data.
Point estimators- it provides a single value estimate for the unknown parameter. For example –
mean, median, mode.

Interval parameter- it provides a range of values within which the true parameter is likely to lie. For
example – confidence interval and prediction interval.

Applications of estimators –

Hyperparameter tuning- hyperparameter is a parameter whose value is set before the learning
process. estimators help for finding optimal hyperparameters like learning rate and batch size
(number of samples processed together as a single unit during training), for model training.

Model evaluation – estimators assess model performance on test data, providing metrics like
accuracy, precision and recall.

Uncertainty evaluation – estimators quantify uncertainty in the model predictions, essential for
applications like autonomous driving and healthcare.

Anomaly detection – estimators identify outliers and anomalies in data, crucial for fraud detection
and fault diagnosis.

Transfer training – estimators adapt pre-trained models as a starting point to new tasks by fine tuning
parameter (means adjusting the pre-trained model’s weights and biases to fit in the new tasks).

Reinforcement learning – estimators optimize policies in reinforcement learning, enabling agents to


make informed decisions.

Types of parameters
Maximum likelihood estimation -the goal of maximum likelihood estimation is to find the optimal
way to fit a distribution to the data. the distribution says “most of the values you measure should be
near my average”.
Likelihood – likelihood refers to the situation where you are trying to find the optimal value for the
mean or standard deviation for a distribution.

Advantages
More consistent
More efficient
More versatile
Flexible
Disadvantages
Computationally expensive
Can be sensitive to choice of optimization algorithm/probability distribution.
Difference between likelihood and probability

Likelihood Probability
1.likelihood refers to the past events with 1.probability refers to the occurrence of future
known outcomes. events.
2.parameters that we assume might be vary 2.parameters that we assume are fix.
3.outcomes are variable. 3.outcomes are fix.
4.likelighood does not add up to 1. 4.probability add up to 1.
5.used in parameter estimation. 5.used in prediction.

Bayesian statistics
1.It is also known as bayes rule, bayes law, Bayesian reasoning.

2.it is used to determine the conditional probability of an event A while an event B has already
occurred.3.it is a way to calculate the value of P(B/A) with the knowledge of P(A/B).
4.it always updating the probability prediction of an event by observing new information of real
world.

Applications of Bayesian statistics

1.recommendation system

2.weather forecasting

3.traffic prediction

4.demand forecasting

5.share market

6.chatbots and personal digital assistants.

Example -what is the probability that a person has disease dengue with neck pain?

Given: P(neck-pain) (a) = 0.2

P(dengue)(b) = 1/30,000

P(a/b) = 0.8

A= probability that person has neck pain.

B= probability that a person has dengue.

P(b/a) =?

P(b/a) = p(a/b) p(b) / p(a)

= 0.8 * 1/30,000 /0.2

=0.00133

Gradient descent – gradient descent is known as one of the most commonly used optimization
algorithms to train ml models by means of minimizing the errors between actual results and
expected results. It helps in finding the local minimum of a function.

Formula

New formula = old value-step size

New formula (new guess) = old value (previous guess)-step size (how much you shift)
Cost function – the cost function is defined as the measurement of difference
or error between actual values and expected values at the current position and
present in the form of a single real number.
Types of gradient descent
1.Batch gradient descent- we use all our training data in a single iteration of algorithm.

2.stochastic gradient descent- this the type of gradient descent that runs one training example per
iteration.

3.mini batch gradient descent- it is the combination of both gradient descent and stochastic
gradient descent. It divides the training datasets into small batch sizes then performs the updates
on those batches separately.
Batch Gd Stochastic Gd Mini batch Gd
1.computes gradient using the 1.it computes gradient using 1.it computes gradient using a
whole training sample Data. single sample at a time. small batch Sizes.
2.slow and computationally 2.faster and less 2.faster than batch GD and
expensive algorithm. computationally expensive slower than stochastic gradient
than batch GD. descent and less costly than
batch Gd.
3. not good for huge training 3.it can be used for large 3.it is suitable for both small
samples. datasets. and large datasets.
4.deterministic in nature. 4.stochastic or non - 4.it is the combination of both
deterministic in nature. batch and stochastic GD’s.
5.it gives guarantee to give 5.gives good solution but not 5.not guaranteed to find
optimal solution. optimal. optimal solution.
6.more accurate. 6.less accurate. 6.more accurate than
stochastic GD but less accurate
than batch Gd.
7.it produces less noise. 7.easier to allocate in desired 7.easier to fit in allocated
memory. memory.
8.convergence is guaranteed, 8.convergence is faster than 8.convergence is faster than
but may be slow for larger batch Gd for large datasets. batch Gd and more stable than
datasets. stochastic Gd.

learning rate: the learning rate is a hyperparameter in machine learning that controls how quickly a
model learns from the training data. It determines the step size of each iteration in the optimization
algorithm such as gradient descent.

FFNN

A Feedforward Neural Network (FNN) is a type of artificial neural network where connections
between the nodes do not form cycles. This characteristic differentiates it from recurrent neural
networks (RNNs). The network consists of an input layer, one or more hidden layers, and an output
layer. Information flows in one direction—from input to output—hence the name “feedforward.”

Structure of a Feedforward Neural Network

1. Input Layer: The input layer consists of neurons that receive the input data. Each neuron in
the input layer represents a feature of the input data.

2. Hidden Layers: One or more hidden layers are placed between the input and output layers.
These layers are responsible for learning the complex patterns in the data. Each neuron in a
hidden layer applies a weighted sum of inputs followed by a non-linear activation function.

3. Output Layer: The output layer provides the final output of the network. The number of
neurons in this layer corresponds to the number of classes in a classification problem or the
number of outputs in a regression problem.
Each connection between neurons in these layers has an associated weight that is adjusted during
the training process to minimize the error in predictions.

Activation Functions

Activation functions introduce non-linearity into the network, enabling it to learn and model complex
data patterns. Common activation functions include:

For example visit: https://www.youtube.com/watch?v=eOtGPlAS6Yg

DFNN- DFNN stands for “deep feed forward neural network”. It also known as multilayer perceptron
(MLP) are composed multiple layers of interconnected neurons arranged in sequential manner.

Each neuron receives input signals from the previous layer, applies a non-linear activation function
and sends output signal to next layer.
Applications of DFNN

1.image classification – DFNN are used in computer vision tasks for recognizing and classifying
objects within images. They can distinguish between different categories, such as identifying
whether an image contains a cat, dog, car etc.

2.speech recognition – in natural language processing, DFNNs are employed to convert spoken
language into text. They are a crucial component of virtual assistants like Siri, Gemini, ALEXA etc.

3.natural language processing – DFNNs are used for tasks like sentiment analysis, text classification
and language translation. For instance – they help in determining the sentiment of a tweet or
categorizing news articles.

4.fraud detection – in finance, DFNNs are used to detect fraudulent transactions by analysing
patterns in transaction data. They help in distinguishing between legitimate and suspicious activities.

5.recommendation systems -platforms like Netflix, Amazon and YouTube use DFNNs to suggest
products, movies and videos to users based on their previous interactions and preferences.

Gradient based learning –

Hidden units- hidden units are neurons or nodes in the hidden layers of a neural network. These
are the layers between the input layer and the output layer, where intermediate computations occur.
hidden units receive inputs from previous layer, apply a non-linear activation function and transmit
their output to next layer.

Types of hidden units

Fully connected hidden Convolutional hidden units Recurrent hidden units


units
1.dense layers where each 1.layers where units are 1.recurrent layers where
unit is connected to every connected to small localized neurons have connections
unit in the previous and region of the input. looping back on themselves.
next layers.
2.for example there is an 2.for an example instead of 2.this is like how you
image you can clearly see a seeing a picture you are remember what you read
picture because here each only able to see some parts earlier in a book to make
part (pixel) of image is or features of an image and sense of the next part.
connected to each other in based on some features of
direct or indirect way. image you easily recognized
the image.

Architecture design
It refers to the creation and configuration of neural network architectures. This involves deciding the
number and type of layers (FFNN), (CNN), (RNN), the connection between these layers and the
activation function is decided.

types of neural networks -


FFNN CNN RNN
1. signal flow in one forward 1.signal flows in forward 1.signal can flow in both
direction only. direction with small localized forward and backward
region. direction.

2.used for general purpose 2.primarily used for image and 2.used for natural language
tasks like- regression and video processing, also used in processing, sequential data
classification. tasks like object detection. prediction.
3.uses dense layers with fully 3.uses convolutional layers to 3.uses recurrent layers to
connected neurons. extract features, pooling layers handle temporal
to reduce dimensionality. dependencies.
4.no parameter sharing; each 4.shares parameters within 4.parameters are shared
connection has its own weight. convolutional filters. across time steps within
recurrent connections.
5.generally less 5.higher computationally cost 5.it can be computationally
computationally intensive; due to convolution operations. expensive due to recurrent
depends upon network size. connections.
6.easier to train with 6.training can be complex due 6. it can be difficult to train
straightforward gradient to convolutional layers but has due to vanishing/exploding
descent methods. well defined techniques. gradient issues; solutions like
LSTMs (long-short-term
memory) and GRUs (gated
recurrent units) are used.

7.requires less memory. 7.requires more memory. 7.requires more memory to


handle hidden states and
sequences.

Computational graphs
1. A computational graph is the directed graph that is used for expressing and evaluating
mathematical expressions.

2. these can be used for two different types of calculations – forward computation and backward
computation.

3. a few terminologies in computational graphs are as follows-

A variable which could be of any type scalar, vector, tensor or any other type of, is represented by a
node.

Function argument & data dependency are both represented by an edge.

For example, we have an expression – (Y = (a+b) * (b-c))

To solve this, we introduce any two variables (D & E), such that every output has an output variable

d = a + b (addition)

e = b - c (subtraction)

Y = D * E (multiply)

Now we have three operations – addition, subtraction, multiplication, we can now make
computational graph easily.

Computational graphs in deep learning


Computations of the neural network are organized in terms of a forward pass or forward propagation
step in which we compute the output of the neural network, followed by a backward pass or
backward propagation step, which we use to compute gradients/derivatives.

If one wants to understand derivatives in a computational graph, the key is to understand how a
change in one variable brings changes in that variable which depends upon it. If “a” directly affects
“c” then we want to know how it affects to “c”. if we make slight change in the value of “a” how does
“c” change? We can term this as the partial derivative of “c” with respect to “a”.

We have to follow chain rule to evaluate partial derivatives of final output variable with respect to
input variables: a, b, c.
Types of computational graphs –
static computational graphs –
1.fixed graph structure – the graph is built and optimized before runtime, and its structure
remains unchanged during execution.

2.faster execution – since the graph is optimized beforehand, so the execution is faster.
3.less flexible – changes to the graph require re-compilation, making it less flexible.
4.memory efficient – memory usage is less because graph structure is fixed.
5.used in – TensorFlow, caffe and other frameworks that prioritize speed and efficiency.
Dynamic computational graphs-
1.dynamic graph structure – the graph structure can change during runtime, allowing for
more flexibility.

2.slower execution – since the graph is optimized during runtime, so execution is slower.
3.more flexible- changes to the graph are made dynamically, making it more flexible.
4.more memory use- since the graph structure is not fixed, so memory usage may increase.
5.used in – PyTorch, Dynet and other frameworks that prioritize flexibility and rapid prototyping.
Backpropagation in deep learning
Backpropagation is a fundamental algorithm used in training artificial neural networks, particularly
in deep learning. It is essential for adjusting the weights in the neural network to minimizing the
errors in predictions.

How propagation works:

1.initialization- the neural network starts with initial random weights assigned to each connection
between neurons.

2.forward pass – the input data is passed through the network layer by layer, where each neuron
performs a weighted sum of its inputs followed by an activation function to produce an output. this
output eventually reaches the final layer, which provides the prediction of the network.

3.calculate rate -the error is calculated by comparing the network’s output with the actual target
value using a loss function (e.g., mean squared error). The error quantifies how far the network’s
prediction from the correct answer.

4.backpropagation of error (backward pass) -the error is propagated backward from the output layer
to the input layer. this involves calculating the gradient (partial derivative) of the error with respect
to each weight in the network. we calculate chain rule to calculate these gradients layer by layer.
5.updated weights -once the gradients are computed, the weights are updated using a method called
gradient descent. The weight update is done by subtracting a fraction of the gradient from the
current weight.

6.iteration – the process of forward pass, error calculation, backpropagation and weight update is
update is repeated for many iterations (epochs) over the training data until the network converges to
a minimum error or loss.

Advantages of backpropagation in deep learning-

1.effiency – backpropagation is computationally efficient as it allows the network to adjust weights


and biases iteratively using gradient descent. this efficiency is crucial when dealing with large
datasets and complex models.

2.automatic differentiation – it enables automatic calculation of gradients for each layer, which helps
in optimizing neural networks during training without manually deriving complex derivatives.

3.generalization – backpropagation helps in model generalization it means the ability of model to


perform well on new, unseen data that was not part of training data. A model which generalizes well
on testing data, performs better on real-life data.

4.flexibility – the algorithm is versatile and can be applied to different types of neural networks,
including convolutional and recurrent networks, across various applications like image recognition,
speech processing and natural language processing.

5.support for complex architectures- back propagation supports the training of complex network
architectures, including those with multiple layers, non -linear activation functions and recurrent
connections.

Disadvantages of back propagation

1.vanishing/exploding gradients – in deep networks, gradients can become very small (vanishing
gradients) or very large (exploding gradients), making bit difficult for the network to learn efficiently,
especially in earlier layers.

2.local minima – backpropagation uses gradient descent, which may get stuck in local minima of the
loss function, leading to suboptimal solutions. local minima refer to the point where the function
value is lower than at its immediate neighbours but not necessarily the lowest values across the
entire function. Global minim refers to the point where the function achieves its absolute lowest
value across the entire domain.

3.require large datasets- deep learning models trained with backpropagation typically require large
amounts of labelled data to achieve good performance, which might not always be possible.

4.hyperparameter sensitivity- the performance of backpropagation is heavily depended upon


hyperparameters (parameters which set before training) such as learning rate or batch size etc.

What is regularization?

Regularization is a technique used in machine learning to prevent overfitting and improve the
generalization ability of a model.it helps to address this problem by adding a penalty term to the loss
function during training, discouraging the model from becoming too complex or fitting the noise in
the training data.it encourages the model to focus on the most important patterns in the data and
avoid memorizing noise.
Types of regularization-

1.Lasso regression (L1 regularization)


2.ridge regression (L2 regularization)- it adds penalty equivalent to the square of the magnitude of
the coefficients.

PARAMETER PENALTIES

Parameter penalties in deep learning refer to techniques used in prevent overfitting by adding
constraints or regularization terms to the loss function.

List of parameter penalties –

1. L1 regularization (Lasso regression)

2. L2 regularization (Ridge regression)

3. Elastic net regularization (L1 + L2)

4. Dropout

5.Early stopping

6. Data augmentation

NOTE – L1 and L2 regularization are already explained above.

3. Elastic net regularization -it the combination of both L1 and L2 regularization.

Where L1 deals with co-linearity (the interdependence between various independent variables) and
L2 deals with feature selection which makes the unnecessary coefficients zero.
2
i . e - Loss= Original Loss + λ 1‖w‖ + λ2‖w‖

4. Dropout – Dropout is a regularization technique which involves randomly ignoring or “dropping


out” some neurons or features in neural networks to prevent overfitting. Dropout randomly chosen
proportion of neurons and their connections within layer and temporarily removes them from the
network. Actually, dropout refers to data or noise that is intentionally dropped from a neural network
to improve processing and time to results.

5. Early stopping – it is a regularization technique used in deep learning to prevent model from
overfitting by stopping the training process when the performance of model on training set starts to
deteriorate. Rather than training the model for the fixed number of iterations (epochs), early
stopping dynamically decides when to stop training based on the performance of unseen data. By
stopping training early, you save time and computational power that would otherwise waste on
unnecessary iterations.

6. Data Augmentation - it is the method to generate new training data from existing data by applying
random transformations such as rotations, translations, scaling and colour changing.

Types of data augmentation –

1. Image augmentation: -

*Rotation- rotating images by small angle.

*Flipping-horizontally or vertically flipping images.

*Scaling- zooming in or zooming out of images.

*Cropping- randomly cropping parts of the images.

2.Text augmentation: -

*Synonym Replacement – replacing words with their synonyms.

*Word shuffling – changing the order of words in a sentence while maintaining the meaning of
sentence.

*Language translation – translating text from one to another language.

*Character insertion-deletion – inserting a character or deleting any character from text.


3.Audio augmentation –

*Pitch shifting- changing the pitch of the audio file.

*Volume up and down – speed up or down the audio, volume up or down the video.

*Background Noise addition- adding random noise in background to simulate different


environments.

Multitask learning – multitask learning refers to the training of the single neural network
To perform multiple tasks simultaneously.

Types of multitask learning –

1. multioutput learning – a model predicts multiple outputs for a single input.

2. multitask learning – a single model is trained to perform multiple tasks with separate outputs.

3. multi- objective learning – a single model is trained to achieve multiple objectives.

Examples

1.image recognition – image recognize with predefined pictures, marking attendance, analyse
gestures etc.

2.computer vision – tasks like object detection, semantic segmentation and image classification can
be tackled together.

3.autonomous vehicles – tasks like object detection, traffic flow, pedestrian recognition can be
learned jointly.

4.healthcare-predicting multiple health outcomes (e.g. different diseases) from the same patient
data.

Bagging - Bagging is also known as bootstrap aggregation.it is an ensemble learning method that is
used to reduce variance and prevent overfitting within a noisy dataset by combining the predictions
of multiple models.

How bagging works-


Bootstrapping – create several different training datasets by randomly sampling with replacement
from the original dataset.

Train multiple models- train a separate model on each of these bootstrapped datasets. In case of
deep learning, this could involve training multiple neural networks, each on a different bootstrapped
dataset.

Aggregate predictions – when making predictions, each trained model makes its own prediction and
the final prediction is made by combining these by averaging for regression tasks or majority voting
for classification tasks.

Dataset- a dataset is a collection of the data that is used to train and test the algorithms to build
predictive models. it typically consists of rows and columns. where each row represents data point
and each column corresponds to a characteristic or attribute of the data.
Adversarial training – adversarial training is a technique is a technique that has been develop
to protect machine learning models from adversarial examples (input data that has been
intentionally modified to cause a machine learning model to make mistake. The goal of the
adversarial training is to make such type of model which could distinguish between legitimate and
malicious inputs. It performs two types of attacks –

White box attacks- in this type attacker have complete knowledge of target ml model including
architecture, parameter and training data. They can directly access and analyse the model to craft
adversarial examples.

Black box attacks – it is the reverse of white box attacks where attacker has no knowledge of targeted
model. they can only query the model with input and observe the corresponding output.

Adversarial optimization – adversarial optimization is a way to train AI models to be more


robust and secure. The goal of the adversarial optimization to make such type of model which can
perform well even if someone tries to trick it with fake or manipulated data.

How its work –


1.generate fake data – create fake and modified data that can trick the model.

2. train the model – train the model on both real and fake data.

3.optimize- adjust the parameters to make it perform well on both real and fake data.
Unit-3

Define following terms-

1.convolution-it is a mathematical operation that is applied in a variety of fields such as image


processing audio and signal processing tasks to extract meaningful features from the input data by
applying various filters (also known as kernels).

2.what is kernel?

Ans- A kernel in deep learning is a function or method that is used to capture non-linear patterns in
data, identify patterns and relationships and extract higher-level features. Or it is mathematical
function or template that performs operations on data to transform it into a more desirable format.it
works as a filter to find higher level features from input data. kernel is used in classification problems,
such as support vector machines. depending upon the size of input data and level of granularity for
extracted features, the kernel shape is chosen Generally it is a small matrix like 3*3,5*5,7*7.

3.what is stride?

Ans-stride is a parameter that dictates the movement of the kernel or filter, across the input data,
such as an image. when performing a convolution operation, the stride determines how many units
the filter shifts at each step. This shift can be horizontal, vertical or both, depending upon then
stride’s configuration.it is like telling our filters how big of steps they should take while sliding over
the picture in one direction. or it’s like how we decide to take big or small steps when playing jump
games.

What is padding with types?

Ans-Padding in deep learning is the process of adding extra values, often around the boundary of an
input matrix (such as an image), before it passes to the convolutional layer in the neural network.
The primary purpose of padding is to control the size of the output and to ensure that the
convolution operation can effectively process the edges of the input data. without padding
convolution operations would only be able to apply filters to the central part of the input and lose
information from the borders.

Why padding is important?

1.Maintaining spatial dimensions: in many cases we want to maintain the same height and width of
the input and output features, particularly in deep architectures where loosing too much spatial
information can degrade the performance. padding helps to retain the original dimensions.

2.Edge handling: without padding, convolutional filters don’t fully cover the edges of the input. This
causes a loss of information near the boundaries.

3.preventing dimensionality shrinking:

Types of padding-there are two common types of padding used in neural networks:

Valid padding-this type of padding involves no padding at all. Convolution operation is performed
only on the valid overlap between the filter and the input.as a result the output dimensions will be
smaller than the input dimensions. valid padding is suitable when reducing dimensionality is not an
issue.
Same padding-in this approach, padding is added to the input so that after applying convolution
operation output dimensions will be same as the input dimensions. same padding is suitable where
preserving the spatial dimensions throughout the layers is crucial.

Pooling- pooling is a technique used in deep learning in convolutional neural network (CNN), to
reduce the spatial dimensions (width and height) of an input feature map.it helps to reduces the size
of data by keeping only the most important features while making the model more efficient and less
prone to overfitting.

Purpose of pooling-

#Prevent overfitting

#Feature extraction

#Dimensionality reduction

Types of pooling layers-

1.-max pooling-max pooling takes a small patch (2*2 or 3*3 block) from the input data and pics the
maximum value from that patch.

-it keeps the most prominent features, like the most-brightest spot on the image or the most
noticeable pattern, while throwing away the less important values.

-let you have a block of 2*2[1,2], [3,4] max pool will pick the largest number 4 and discard the rest.

2.-average pooling-instead of picking the largest value, average pooling takes the average of all the
values in the patch.

-it gives smoother less aggressive reduction in data which is useful when you do not want to lose too
much detail.

-for the same 2*2 block [1,2], [3,4], average pooling would calculate the average (1+2+3+4)/4=2.5.

3.-Global max pooling -global max pooling selects maximum value from entire feature map.

-As max pooling will focus on most noticeable portion of the image while global max pooling will
focus on entire picture, select important features and discarded rest of the part.

-for example4*4 block [[1,3,5,2], [0,5,6,4], [7,6,2,0], [9,8,3,2]] here max pool will analyse whole data
or image and select most relevant feature and will discard rest of the information i.e – in this block 9
is max value so other data will be discarded.

4.-Global average pooling-global average pooling calculates the average of all the values in the
feature map and reducing it to the single value.

-like global max pool applied across entire feature map.

-for example4*4 block [[1,3,5,2], [0,5,6,4], [7,6,2,0], [9,8,3,2]] it will return


(1+3+5+2+0+5+6+4+7+6+2+0+9+8+3+2)/16=4.5

Advantages-

#Prevent overfitting

#feature extraction
#dimensonality reduction

Disadvantages-

#loss of information

#hyperparameter tuning

CNN- a convolutional neural network is the extended version of artificial neural network
(ANN)which is predominantly used to extract meaningful features from grid like matrix dataset such
as images.it is meaningful tool for image recognition, object detection and image classification tasks.

Functions of CNN-

1.feature extraction- CNN automatically extract meaningful features from images such as edges,
lines, shapes etc.

2.image classification- CNN classifies images into predefined categories such as objects, scenes or
actions also helps for filtering, tagging etc.

3.pooling-pooling reduces the dimensionality(size) of the feature map while preserving important
information.

4.image denoising-CNN remove noise and artifacts from the images, improving their quality and
enhances clarity and sharpness.

5.image segmentation- dividing an image into segments or regions, often used in medical imaging.
(e.g., something tumours in MRI scan).

6.facial recognition-in face recognition system, Mobilenet (model of CNN), helps to detect whether a
person wear a mask or not.

Operations of CNN-

1.Convolution- convolution involves sliding a filter or kernel over input images and extract
meaningful features such as edges, textures and patterns etc.

2.Activation function- after convolution, an activation function (commonly ReLU- rectified linear unit)
is applied to introduce the non-linearity to the model.it helps to learn complex relationships in the
data.
3.Pooling -pooling layers (like max and average pooling) reduces the spatial dimensions of the feature
maps, leading to decrease computational load while preserving the important features.

4.Flattening – it is the process to convert the multidimensional output (2D and 3D) from
convolutional and pooling layers into a 1D vector. This step is necessary because the fully connected
layers require 1D input to perform the final classification or prediction. flattening takes all the values
in the feature maps and arrange them into a single long vector.

5.fully connected layers-the flattened output is fed into one or fully connected layers, which learns
complex patterns and relationships.

6.output-the final output is generated which could be probability distribution for classification
algorithm or a continuous(constant) value for regression.

7.backpropagation- the error is calculated and propagated backward to update the weights and
biases of the model.

8.optimization-the model’s parameters are optimized using many optimization algorithms commonly
used (SGD) stochastic gradient descent.

You might also like