0% found this document useful (0 votes)
46 views22 pages

Deep Learning UNIT 1

deep learning UNIT 1

Uploaded by

Prerna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views22 pages

Deep Learning UNIT 1

deep learning UNIT 1

Uploaded by

Prerna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Introduction of artificial neural network

and deep learning


10 October 2024 12:25

Neural Networks are a core technology in Artificial Intelligence (AI) that mimic the human brain's ability to
recognize patterns, learn from data, and make decisions. They are used in various AI applications, from image
and speech recognition to autonomous systems and natural language processing.
What is a Neural Network?
A Neural Network is a computational model inspired by the way biological neural networks in the human
brain process information. It consists of layers of interconnected nodes (also known as neurons) that work
together to process input data and produce an output.

Structure of a Neural Network


1. Neurons:
○ The basic unit of a neural network. Each neuron receives input, processes it, and passes it on as
output to other neurons.
2. Layers:
○ Input Layer: The first layer of the network that receives the input data. Each neuron in this layer
represents a feature or attribute of the data.
○ Hidden Layers: Layers between the input and output layers. These layers perform computations and
extract features from the input data. A network can have multiple hidden layers.
○ Output Layer: The final layer that produces the output, which could be a classification, prediction, or
other decision based on the input data.
3. Weights:
○ Connections between neurons have associated weights, which are parameters that the network
learns during training. Weights determine the strength of the influence one neuron has on another.
4. Activation Function:
○ After computing the weighted sum of inputs, the neuron applies an activation function to introduce
non-linearity into the model, enabling the network to solve complex problems. Common activation
functions include ReLU (Rectified Linear Unit), Sigmoid, and Tanh.
5. Bias:
○ A bias term is added to the weighted sum before applying the activation function. It allows the model
to better fit the data.

Artificial Neural Network primarily consists of three layers:

UNIT 1 Page 1
Input Layer:
As the name suggests, it accepts inputs in several different formats provided by the programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the calculations to find
hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally results in output
that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs and includes a
bias. This computation is represented in the form of a transfer function.

It determines weighted total is passed as an input to an activation function to produce the output.
Activation functions choose whether a node should fire or not. Only those who are fired make it to the
output layer. There are distinctive activation functions available that can be applied upon the sort of task
we are performing.

How Neural Networks Work


1. Forward Propagation:
○ Input data is fed into the network, passing through each layer (input, hidden, and output). At
each neuron, the inputs are multiplied by their respective weights, summed up, and passed
through an activation function to produce an output. This process continues layer by layer until
the final output is produced.
2. Training a Neural Network:
○ Neural networks learn through a process called training, where the network adjusts its weights
and biases based on the error in its predictions.
Steps in Training:
○ Initialization: Weights are initialized randomly or using specific techniques.
○ Forward Propagation: The input data is passed through the network to generate an output.
○ Loss Calculation: The difference between the predicted output and the actual output (ground
truth) is calculated using a loss function (e.g., Mean Squared Error for regression, Cross-Entropy
for classification).
○ Backward Propagation: The error is propagated back through the network, and the weights are
updated using optimization algorithms like Gradient Descent.
○ Iteration: The process is repeated over multiple iterations (epochs) until the network converges
to a solution where the error is minimized.
3. Backpropagation:
○ A key algorithm for training neural networks, backpropagation calculates the gradient of the loss
function with respect to each weight in the network. This information is used to update the
weights to minimize the loss.

UNIT 1 Page 2
weights to minimize the loss.

Types of Neural Networks


1. Feedforward Neural Networks (FNN):
○ The simplest type of neural network where the information moves in only one direction—from
input to output.
2. Convolutional Neural Networks (CNN):
○ Primarily used for processing grid-like data such as images. CNNs apply convolutional layers to
extract features and are widely used in computer vision tasks.
3. Recurrent Neural Networks (RNN):
○ Designed to handle sequential data by having connections that form cycles. RNNs are used in
time-series forecasting, language modeling, and other tasks where the order of input data
matters.
4. Generative Adversarial Networks (GANs):
○ Consist of two networks, a generator and a discriminator, that compete against each other to
create realistic data. GANs are used in image generation, style transfer, and other creative AI
tasks.

Applications of Neural Networks


1. Computer Vision:
○ Image recognition, object detection, facial recognition, and medical imaging.
2. Natural Language Processing (NLP):
○ Text generation, sentiment analysis, language translation, and speech recognition.
3. Autonomous Systems:
○ Self-driving cars, robotics, and drones.
4. Healthcare:
○ Disease diagnosis, drug discovery, and personalized treatment plans.
5. Finance:
○ Algorithmic trading, fraud detection, and credit scoring.
6. Gaming:
○ AI opponents in video games, real-time strategy planning, and virtual reality experiences.

Advantages of Neural Networks


• Ability to Learn from Data: Neural networks can automatically learn complex patterns and relationships
from large datasets.
• Flexibility: They can be adapted to various types of data (images, text, audio) and problems (classification,
regression, clustering).
• Scalability: Neural networks can handle vast amounts of data and perform well on complex tasks.

Challenges and Limitations


• Computationally Intensive: Training deep neural networks requires significant computational resources
and time.
• Need for Large Datasets: Neural networks typically require large amounts of labeled data to achieve good
performance.
• Interpretability: Neural networks are often considered "black boxes" because it can be difficult to
understand how they make decisions.

{this is same as above}


What is Deep Learning?
Deep Learning is a subset of machine learning that employs neural networks with multiple layers (deep

UNIT 1 Page 3
Deep Learning is a subset of machine learning that employs neural networks with multiple layers (deep
networks) to model complex patterns in data. Unlike traditional machine learning, which often requires
manual feature extraction, deep learning automates this process through hierarchical representation learning.

1. Architecture of Deep Learning


Deep Learning models are structured in layers:
a. Layers of a Neural Network
• Input Layer: This layer receives the input data. Each neuron corresponds to a feature in the dataset.
• Hidden Layers:
○ Multiple hidden layers can exist between the input and output layers. The depth of the network
allows it to learn increasingly abstract features.
○ Each neuron in a hidden layer applies a transformation (weighted sum and activation function) to its
inputs.
• Output Layer: This layer generates the final output. For classification tasks, it typically uses a softmax
function to provide probabilities for each class.

b. Activation Functions
Activation functions introduce non-linearity, enabling the model to learn complex relationships. Common
activation functions include:

2. Types of Deep Learning Networks


Different architectures cater to various types of data and tasks:
a. Feedforward Neural Networks (FNNs)
• The simplest form of neural networks where data moves in one direction (from input to output).
• Suitable for structured data like tabular datasets.
b. Convolutional Neural Networks (CNNs)
• Designed for processing grid-like data, such as images.
• Utilize convolutional layers that apply filters to detect patterns (edges, textures).
• Often followed by pooling layers to reduce dimensionality, preserving important features.
c. Recurrent Neural Networks (RNNs)
• Tailored for sequential data, such as time series or natural language.
• RNNs maintain a memory of previous inputs using loops in their architecture, allowing them to process
sequences of varying lengths.
• Variants like Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRUs) mitigate issues like
vanishing gradients.
d. Transformers
• Introduced in the paper "Attention is All You Need," Transformers are designed for sequence-to-sequence
tasks.
• Utilize self-attention mechanisms to weigh the significance of different input parts, making them highly
effective for NLP tasks.
• Pre-trained models like BERT and GPT have revolutionized tasks such as translation, summarization, and
question answering.

3. Training Deep Learning Models


Training deep learning models involves several steps:
a. Data Preparation
• Data must be cleaned, normalized, and possibly augmented (especially in image processing) to enhance
robustness.
• Splitting the dataset into training, validation, and test sets is crucial to evaluate performance.
b. Forward Propagation
• Input data passes through the network, producing predictions.
• Each neuron's output is computed using weights and activation functions.
c. Loss Function
• A loss function quantifies the difference between the predicted and actual outputs. Common loss
UNIT 1 Page 4
• A loss function quantifies the difference between the predicted and actual outputs. Common loss
functions include:
○ Mean Squared Error: Used for regression tasks.
○ Cross-Entropy Loss: Used for classification tasks.
d. Backpropagation
• The gradients of the loss function concerning each weight are computed using the chain rule.
• These gradients are used to update the weights in the direction that minimizes the loss.
e. Optimization Algorithms
• Various optimization algorithms are employed to update weights:
○ Stochastic Gradient Descent (SGD): Updates weights using a subset of data, enhancing efficiency.
○ Adam: Combines the advantages of AdaGrad and RMSProp, adjusting the learning rate adaptively.
f. Hyperparameter Tuning
• Hyperparameters like learning rate, batch size, number of layers, and neurons per layer can significantly
impact model performance and require careful tuning.

4. Applications of Deep Learning


Deep Learning has led to breakthroughs in numerous fields:
• Computer Vision: Object detection, image classification, facial recognition, and medical image analysis.
• Natural Language Processing: Language translation, sentiment analysis, chatbots, and content
generation.
• Speech Recognition: Voice assistants, transcription services, and real-time translation.
• Reinforcement Learning: Game playing (e.g., AlphaGo), robotics, and autonomous vehicles.

5. Challenges and Limitations


While deep learning has immense potential, it also faces challenges:
• Data Requirements: Deep learning models typically require large amounts of labeled data, which can be
expensive and time-consuming to obtain.
• Computational Resources: Training deep networks can be resource-intensive, often requiring GPUs or
TPUs.
• Overfitting: Deep networks can memorize training data instead of generalizing, necessitating techniques
like dropout, regularization, and early stopping.
• Interpretability: Deep learning models are often considered "black boxes," making it challenging to
understand how they arrive at specific decisions.

UNIT 1 Page 5
characteristics of neural networks terminology
11 October 2024 11:28

Neural networks, inspired by the human brain, are designed to recognize patterns and relationships in data.
Here's a detailed breakdown of key neural network terminology:
1. Neuron (Node or Perceptron)
• Definition: The basic unit of a neural network. It receives inputs, applies a weight to each, sums them,
and passes the result through an activation function.
• Function: Similar to biological neurons, it processes inputs and produces an output, which can be fed
into other neurons in the next layer.
• Mathematics: Output = Activation Function (∑ (Input × Weight) + Bias).
2. Input Layer
• Definition: The first layer of the network that receives the input data. Each neuron in this layer
corresponds to one feature in the dataset.
• Function: Transmits the input data to the subsequent layers without applying any computation.
3. Hidden Layer
• Definition: Layers between the input and output layers where the actual computation occurs. There can
be one or more hidden layers depending on the complexity of the problem.
• Function: These layers extract and learn features from the input data through weight adjustments.
• Deep Learning: Neural networks with many hidden layers are called deep neural networks (DNNs),
enabling them to model more complex patterns.
4. Output Layer
• Definition: The final layer that produces the network's prediction or classification. Its neurons represent
the possible classes (for classification tasks) or continuous outputs (for regression tasks).
• Function: The outputs of this layer are interpreted as the network’s final decision or prediction based on
the processed data.
5. Weights
• Definition: Parameters that are applied to inputs to adjust their influence on the output. Each
connection between neurons has a weight associated with it.
• Function: Weights are learned through training and determine the importance of input features.
• Gradient Descent: Weights are updated iteratively during training using optimization algorithms like
gradient descent to minimize the error.
6. Bias
• Definition: An additional parameter added to the sum of inputs and weights, allowing the model to fit
the data better.
• Function: Bias shifts the output of the activation function, enabling the network to handle patterns that
don't pass through the origin.
7. Activation Function
• Definition: A function applied to the weighted sum of inputs in a neuron, determining whether the
neuron should be activated (i.e., produce an output).
• Types:
○ Sigmoid: Maps input values to a range between 0 and 1, making it useful for probability-based
outputs.
○ Tanh: Similar to sigmoid but maps values between -1 and 1, often used to center data.
○ ReLU (Rectified Linear Unit): Outputs the input directly if positive, otherwise outputs zero. It is
widely used for its simplicity and effectiveness in deep networks.
○ Softmax: Converts logits (raw prediction scores) into probabilities for multi-class classification.
8. Loss Function (Cost Function)
• Definition: A function that measures how far the predicted output is from the actual output. It
quantifies the error of the network’s predictions.
• Types:
UNIT 1 Page 6
• Types:
○ Mean Squared Error (MSE): Used in regression tasks to measure the average squared difference
between predicted and actual values.
○ Cross-Entropy Loss: Used in classification tasks to measure the difference between predicted and
actual probabilities.
9. Backpropagation
• Definition: An algorithm used for training neural networks by updating weights. It involves propagating
the error backward through the network from the output to the input layer.
• Function: Backpropagation calculates the gradient of the loss function with respect to each weight and
updates them using gradient descent to minimize the loss.

UNIT 1 Page 7
neurons, perceptron, backpropagation,
11 October 2024 17:44

1. Neuron
• Definition: A neuron in a neural network is the fundamental unit that mimics a biological neuron. It
processes input data and passes on the information to other neurons.
• Function: Each neuron receives multiple inputs, applies weights to them, sums them up, adds a bias, and
passes the result through an activation function to produce an output.
• Structure:
○ Inputs: Data or signals coming from other neurons or directly from the input layer.
○ Weights: Each input is multiplied by a weight, which represents the strength of that connection.
○ Bias: An extra term added to the weighted sum to help the neuron adjust its output.
○ Activation Function: The result of the weighted sum and bias is passed through a nonlinear function
(like ReLU or Sigmoid) to determine if the neuron should be "activated."

2. Perceptron
• Definition: The perceptron is the simplest form of a neural network, consisting of a single neuron with
adjustable weights and biases. It is the foundational unit of more complex neural networks.
• History: Introduced by Frank Rosenblatt in 1958, the perceptron is an early model of how a neural network
might function, mimicking how a biological neuron processes data.
• Structure:
○ Input Layer: The perceptron takes several binary or real-valued inputs.
○ Weights: Each input is associated with a weight that determines the input’s importance.
○ Weighted Sum: The inputs are multiplied by their respective weights and summed up, along with a
bias term.
○ Activation Function: The perceptron uses a step activation function that produces an output of either
0 or 1, depending on whether the weighted sum exceeds a certain threshold.

3. Backpropagation
• Definition: Backpropagation (short for "backward propagation of errors") is a key algorithm used to train
neural networks by updating the weights of the neurons through the calculation of gradients.
• Role: It helps minimize the error by propagating it backward from the output layer to the input layer,
adjusting weights to reduce the overall error.
• Steps of Backpropagation:
1. Forward Pass: Input data is passed through the network layer by layer, with each neuron producing an
output. The final output is compared with the actual output (label) using a loss function (like cross-
entropy for classification or mean squared error for regression).
2. Loss Calculation: The loss function computes the error between the predicted and actual outputs.
3. Backward Pass (Error Propagation): The error is propagated backward through the network. Using
calculus (chain rule), the algorithm calculates the gradient of the loss function with respect to each

UNIT 1 Page 8
calculus (chain rule), the algorithm calculates the gradient of the loss function with respect to each
weight. This is crucial because it tells the network how much each weight contributes to the total
error.
4. Weight Update: Weights are updated by subtracting a fraction of the gradient from each weight. This
fraction is controlled by the learning rate. This step is done using an optimization technique like
gradient descent.

UNIT 1 Page 9
Basic learning laws
11 October 2024 17:46

The Basic Learning Laws describe the principles or rules by which a neural network learns and adjusts its
weights during the training process. These rules guide how the neural network modifies its parameters
(weights and biases) to reduce the error between its predicted output and the actual output, allowing it to
improve its performance over time. \

1. Hebbian Learning
• Overview: Hebbian learning is based on the principle that "neurons that fire together, wire together." It
was introduced by Donald Hebb in 1949 and is one of the earliest learning laws.
• Concept: If two neurons are frequently activated together, the connection between them becomes
stronger. In other words, if a presynaptic neuron (input neuron) contributes to firing a postsynaptic
neuron (output neuron), the weight of the connection between them is increased.
• Rule: The weight between two connected neurons increases if both are activated at the same time.

• Applications: Hebbian learning is used in unsupervised learning systems, where the network learns from
patterns in the input without explicit labeled data. It has been used in models of associative memory and
in self-organizing networks like Kohonen's Self-Organizing Maps (SOMs).
• Limitations: Pure Hebbian learning can lead to instability, as weights may grow indefinitely if not
controlled. Variants like Oja’s rule introduce constraints to prevent this issue.

○ Developed by: Donald Hebb in 1949.


○ Type: Unsupervised learning algorithm in neural networks.
○ Principle: "Neurons that fire together, wire together."
▪ If two neurons are activated at the same time: The weight between them increases.
▪ If two neurons are activated oppositely: The weight decreases.
▪ If no correlation exists between activations: The weight remains unchanged.
○ Weight Update:
▪ When inputs of both neurons are positive or negative, it results in a strong positive weight.
▪ If one neuron has a positive input and the other a negative input, the weight is strongly negative.

UNIT 1 Page 10
2. Perceptron Learning Rule
• Overview: This rule is used in perceptrons, which are simple neural networks consisting of a single
neuron. It’s designed to adjust the weights of the network in response to errors in the prediction.
• Concept: If the perceptron makes a mistake in classifying a data point, the weights are updated in such a
way that the prediction improves. The perceptron learning rule aims to reduce the error between the
predicted and actual output by changing the weights.

• Applications: This learning rule is used in supervised learning, particularly in linear classifiers. However, it
is limited to linearly separable data (where classes can be separated by a straight line).

3. Delta Rule (Widrow-Hoff Rule)


• Overview: The Delta Rule, also known as the Widrow-Hoff learning rule, is an extension of the perceptron
learning rule and applies to neurons with continuous activation functions, such as sigmoid or linear
functions.
• Concept: The Delta Rule aims to minimize the difference between the predicted output and the actual
output by adjusting the weights using the gradient of the error. It’s widely used in networks with linear
neurons and is the basis of the backpropagation algorithm used in training multi-layer networks.
• Rule: The weight update is proportional to the error between the actual output and the predicted output:

Applications: The Delta Rule is commonly used in training linear regression models and simple neural
networks. It is also foundational for gradient descent and is widely applied in training multi-layer networks
using backpropagation.

• Developed by: Bernard Widrow and Marcian Hoff.


• Type: Supervised learning algorithm with a continuous activation function.
• Other Name: Also known as the Least Mean Square (LMS) method.
• Objective: Minimizes the error across all training patterns.
• Learning Principle:
• Gradient Descent Approach: It continuously adjusts the weights to reduce the error between the desired

UNIT 1 Page 11
• Gradient Descent Approach: It continuously adjusts the weights to reduce the error between the desired
and actual output.
• Modification of Weight: The change in the weight is proportional to the product of the input and the error
(which is the difference between the expected and predicted output).

4. Competitive Learning
• Overview: In competitive learning, neurons compete to be the most active (or the "winner") for a given
input. Only the winning neuron updates its weights, and the others remain unchanged.
• Concept: Neurons are trained to specialize in recognizing different features or patterns in the input. This
form of learning is unsupervised and is often used for clustering.
• Rule: Only the neuron with the highest activation (the "winner") updates its weights to strengthen its
response to the current input. This is often achieved using winner-takes-all strategies.

Applications: Competitive learning is used in clustering algorithms, vector quantization, and self-organizing
maps. It’s a common technique in unsupervised learning where labels are not provided.

• Also Known As: Winner-takes-All rule.


• Type: Unsupervised learning rule.
• Principle:
• Output nodes compete to represent the input pattern.
• The winner node (the one with the strongest output) is given a value of 1, while the rest are set to 0.
• Only the winner's weights are updated, while the rest remain unchanged.
• Neuron Activation: Only one neuron is active at a time in a group of neurons with randomly distributed
weights.

5. Correlation Learning Rule


• Similar Principle to: Hebbian learning rule.
• Type: Supervised learning rule.
• Principle:
○ If two neighboring neurons operate in the same phase at the same time, their weight increases.
If two neurons operate in opposite phases, their weight decreases.

UNIT 1 Page 12
○ If two neurons operate in opposite phases, their weight decreases.
○ Unlike Hebbian learning, this rule uses a targeted response to calculate weight changes.

UNIT 1 Page 13
Activation and Loss function
11 October 2024 17:58

Detailed Notes: Types of Activation Functions


In neural networks, activation functions determine how the weighted sum of the input is transformed into an output from a node
or a neuron. Each type of activation function behaves differently, and choosing the right one is crucial for the performance of a
neural network.

A. Identity Function (Linear Activation Function)

• Behavior:
○ This function outputs the same value as the input.
○ It is a linear function, meaning there is no transformation or change applied to the input.
• Use Case:
○ Often used in the input layer of a neural network where no transformation is required.
○ Rarely used in hidden layers since it does not introduce non-linearity, which is essential for complex learning.
• Advantages:
○ Simple and easy to compute.
○ Useful in some regression models.
• Disadvantages:
○ Does not capture any complex patterns in data due to its linearity.
○ No gradient or slope, meaning weights are updated the same way, leading to limited learning capabilities.

B. Threshold/Step Function

○ Behavior:
○ This function outputs 1 if the input is 0 or positive, and 0 if the input is negative.
○ It transforms input signals into binary outputs (either 0 or 1).
○ It can be viewed as a binary classifier, distinguishing between classes.
○ Use Case:
○ Primarily used for binary classification tasks where the output is either true or false.
○ Commonly applied in single-layer perceptrons.
○ Variant: Threshold function with a threshold value θ:
○ Instead of 0 as the decision boundary, a threshold θ is introduced.

UNIT 1 Page 14
• Advantages:
○ Simple and fast.
○ Effective for binary decisions.
• Disadvantages:
○ Not differentiable, meaning it's not suitable for gradient-based optimization methods.
○ Can't handle multiple classes or complex relationships.

C. ReLU (Rectified Linear Unit) Function

• Behavior:
○ ReLU is the most popular activation function in deep learning and convolutional neural networks (CNNs).
○ For all positive input values, the output is the same as the input. For negative inputs, the output is 0.
○ It introduces non-linearity but still allows for simple, efficient computations.
• Advantages:
○ Efficient computation: Due to its simplicity, it’s computationally fast, making it suitable for large-scale models.
○ Sparse activation: It activates only a portion of the neurons, which reduces computational overhead.
○ Gradient propagation: Helps mitigate the vanishing gradient problem by maintaining gradients when input is positive.
• Disadvantages:
○ Non-differentiability at 0: ReLU is not differentiable at x=0.
○ Dying ReLU problem: Sometimes, neurons can stop responding to any input (when they output 0 all the time) during
training. This can cause parts of the network to "die" or become inactive, leading to performance issues.

D. Sigmoid Function
• Sigmoid functions are S-shaped (or logistic curves) and are highly useful when output values need to be squashed between a
specific range.

1. Binary Sigmoid Function:

UNIT 1 Page 15
• Use Case:
○ Common in binary classification problems.
○ It’s used in the output layer when the desired output is between 0 and 1.
• Advantages:
○ Provides smooth, non-linear output.
○ Useful in probability-based outputs (since the output lies between 0 and 1).
• Disadvantages:
○ Vanishing gradient problem: When the input values are too large or too small, the gradients tend to zero, which slows down
the learning process.
○ Output is not zero-centered, which can affect optimization algorithms.

2. Bipolar Sigmoid Function:

• Behavior:
○ Similar to the binary sigmoid but squashes the input to a range between -1 and +1.
○ It can model outputs that span a negative-to-positive range.
• Use Case:
○ Common in networks where the output needs to represent a bipolar decision, or values ranging between -1 and +1 are
required.
• Advantages:
○ Retains the properties of the binary sigmoid while providing an expanded output range (-1, +1).
• Disadvantages:
○ Similar issues with vanishing gradients and non-zero-centered output as the binary sigmoid function.

E. Hyperbolic Tangent (Tanh) Function

UNIT 1 Page 16
• Behavior:
• The Tanh function is an S-shaped curve similar to the sigmoid but squashes input values to the range (-1, +1).
• When input values are large, the Tanh function saturates to -1 (for negative inputs) or +1 (for positive inputs).
• It is zero-centered, unlike the sigmoid function, meaning the output is symmetrically distributed around zero.
• Use Case:
• Frequently used in backpropagation networks, particularly in hidden layers.
• Suitable for models where negative values are important for learning.
• Advantages:
• Avoids the zero-centered issue of sigmoid functions.
• Works well for data that has strong negative and positive relationships.
• Disadvantages:
• Similar to the sigmoid function, Tanh suffers from the vanishing gradient problem, especially for large inputs.

LOSS FUNCTION
A loss function (also known as a cost function or objective function) is a critical part of a neural network, determining how well
the model is performing during training. It quantifies the difference between the predicted output and the actual output, guiding
the optimization process by updating weights and biases to minimize this error.
The primary goal of training a neural network is to minimize the loss function so that the model predictions are as close as
possible to the actual values. Different types of loss functions are used depending on the type of problem—classification,
regression, or others.

Types of Loss Functions

1. Mean Squared Error (MSE) Loss


• Formula:

• Explanation:
○ MSE measures the average squared difference between the actual and predicted values.
○ The errors are squared to ensure that positive and negative errors do not cancel each other out, and also to penalize larger
errors more severely.
• Use Case:
○ Primarily used in regression problems, where the task is to predict continuous values.
• Advantages:
○ Easy to compute and differentiable, which is important for gradient descent.
○ Penalizes larger errors more, making it sensitive to outliers.
• Disadvantages:
○ The squared error can overly penalize larger errors, making it less robust to outliers.

2. Mean Absolute Error (MAE) Loss


• Formula:

• Explanation:
○ MAE measures the average absolute difference between the actual and predicted values.
○ Unlike MSE, it takes the absolute value of the error, so it does not penalize larger errors as severely.
• Use Case:
○ Also commonly used in regression problems, especially when outliers are present, as it is more robust than MSE.
• Advantages:

UNIT 1 Page 17
• Advantages:
○ MAE gives a more natural, linear measure of error, treating all errors equally without squaring them.
○ Less sensitive to outliers compared to MSE.
• Disadvantages:
○ Since it’s not differentiable at zero, it may be difficult to optimize in some cases, though this issue can often be handled using
sub-gradients.

3. Cross-Entropy Loss (Log Loss)


• Formula:

• Explanation:
○ Cross-Entropy Loss (or log loss) is widely used in classification problems, particularly in binary classification.
○ It measures the difference between the actual class label and the predicted probability. If the predicted probability is close to
the true label, the loss is small, and if it's far from the true label, the loss increases.
• Use Case:
○ Used for binary classification (binary cross-entropy) and multi-class classification (categorical cross-entropy).
• Advantages:
○ Well-suited for probability-based outputs and works well with models that predict probabilities, such as those using sigmoid
or softmax activation functions.
• Disadvantages:
○ Sensitive to poorly estimated probabilities. If the model is very confident but wrong, the loss becomes very large.

4. Hinge Loss
• Formula:

• Explanation:
○ Hinge loss is primarily used in support vector machines (SVMs).
○ It ensures that the margin between the predicted value and the decision boundary is maximized. The loss is zero if the
prediction is correct and greater than the margin. Otherwise, it increases as the prediction moves further away from the
margin.

• Use Case:
○ Commonly used in binary classification tasks for SVMs, where the goal is to create a margin between classes.
• Advantages:
○ It emphasizes margin maximization, which often leads to better generalization in classification tasks.
• Disadvantages:
○ Only applicable to classification problems with linear boundaries.

5. Huber Loss
• Formula:

• Explanation:
○ Huber Loss combines the best of both MSE and MAE. For smaller errors, it behaves like MSE (quadratic), and for larger
errors, it behaves like MAE (linear).
○ This allows Huber Loss to be more robust to outliers compared to MSE while still providing smooth and differentiable error
feedback.

UNIT 1 Page 18
feedback.
• Use Case:
○ Commonly used in regression problems where there are outliers in the data but still requires smooth behavior for small
errors.
• Advantages:
○ More robust to outliers than MSE.
○ Provides smoother gradient feedback than MAE.
• Disadvantages:
○ The threshold δ\deltaδ must be tuned, and improper tuning can affect performance.

6. Kullback-Leibler Divergence (KL Divergence)


• Formula:

• Explanation:
○ KL Divergence measures the difference between two probability distributions: the true distribution P(x)P(x)P(x) and the
predicted distribution Q(x)Q(x)Q(x).
○ It is useful when you are dealing with probability distributions rather than specific class labels or continuous values.
• Use Case:
○ Used in variational autoencoders (VAEs) and other models that learn probability distributions.
• Advantages:
○ Suitable for comparing two distributions and is commonly used in unsupervised learning and probabilistic models.
• Disadvantages:
○ It is asymmetric, meaning it does not treat differences between distributions PPP and QQQ in the same way, which might not
be desirable in certain applications.

Choosing the Right Loss Function


1. For Regression Problems:
○ MSE and MAE are the most common loss functions.
○ MSE is useful when larger errors need to be penalized more, whereas MAE is more robust to outliers.
○ Huber Loss can be used when you need a combination of both, to handle outliers without overly penalizing them.
2. For Classification Problems:
○ Cross-Entropy Loss is the most common for both binary and multi-class classification tasks.
○ Hinge Loss is specifically used for SVM classifiers.
○ KL Divergence is used for models predicting probability distributions, such as autoencoders.
3. For Probabilistic Models:
○ KL Divergence and Cross-Entropy are ideal for handling models with probabilistic outputs.

UNIT 1 Page 19
Function approximation
14 October 2024 10:04

Function approximation refers to the process of estimating a target function, which maps input data to output
predictions, using a model that best captures the underlying relationship between the inputs and outputs. This
concept is fundamental to machine learning, where we seek to build models that generalize well to unseen data
based on training data.

In simpler terms, the goal of function approximation is to find a mathematical function f(x) that can approximate
the true underlying function f*(x), which governs the relationship between inputs and outputs.

Types of Function Approximation


1. Parametric Methods:
○ These methods assume that the function form is known and can be described by a finite number of
parameters.
○ Examples include:
▪ Linear regression: Assumes a linear relationship between input and output.
▪ Polynomial regression: Models non-linear relationships using polynomial terms.
▪ Neural networks: Use learned parameters (weights and biases) to model complex, non-linear
relationships.
2. Non-Parametric Methods:
○ Non-parametric methods do not assume any specific form for the function. Instead, they adapt to the data
as needed, typically requiring more data to function properly.
○ Examples include:
▪ k-Nearest Neighbors (k-NN): Predicts output based on the closest kkk data points.
▪ Decision Trees: Partition the input space into regions and assign a constant output value in each
region.
▪ Gaussian Processes: Provide a probabilistic approach to learning the function by estimating
distributions over functions.

UNIT 1 Page 20
Applications
14 October 2024 10:07

1. Image Recognition and Computer Vision


• Facial Recognition: ANNs, especially deep neural networks, are widely used in facial recognition systems.
They can identify and verify individuals by learning facial features from images or video feeds.
• Object Detection: Convolutional Neural Networks (CNNs) are particularly effective in detecting objects
within images, used in applications such as autonomous vehicles, medical imaging, and security
surveillance.
• Handwriting Recognition: ANNs can be trained to recognize handwritten text by learning various
handwriting styles, which is useful in applications like digitizing written documents and postal code
recognition.

2. Natural Language Processing (NLP)


• Speech Recognition: ANNs are used in automatic speech recognition systems, enabling applications like
virtual assistants (e.g., Siri, Alexa) to convert spoken language into text or commands.
• Language Translation: Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM)
networks are commonly used for machine translation tasks, enabling translation between languages in
real time.
• Sentiment Analysis: ANNs are used to analyze text and determine the sentiment behind it, which is
valuable for understanding customer feedback, reviews, and social media posts.

3. Healthcare and Medical Diagnosis


• Medical Imaging: ANN-based models like CNNs are applied in analyzing medical images such as X-rays,
MRIs, and CT scans for early diagnosis of diseases like cancer, pneumonia, or diabetic retinopathy.
• Drug Discovery: ANNs are utilized to predict how different compounds interact, aiding in the discovery of
new drugs by modeling the biological interactions at the molecular level.
• Disease Prediction: Predictive models using neural networks are developed to forecast the likelihood of
diseases like heart attacks, diabetes, or stroke by analyzing patient health data.

4. Financial Services
• Fraud Detection: ANNs are used to identify fraudulent transactions by recognizing patterns that deviate
from typical user behavior, helping banks and financial institutions mitigate fraud risks.
• Stock Market Prediction: Neural networks, particularly recurrent architectures like LSTMs, are employed
in forecasting stock prices by analyzing historical data, news, and other influencing factors.
• Credit Scoring: ANNs assist in predicting the creditworthiness of individuals by assessing various factors
like credit history, loan applications, and spending patterns.

5. Autonomous Systems and Robotics


• Self-Driving Cars: Autonomous vehicles rely on neural networks to process sensory data from cameras,
LiDAR, and radar to make real-time decisions such as lane following, obstacle avoidance, and traffic sign
recognition.
• Robotics: Neural networks are used in robotics for path planning, object manipulation, and decision-
making processes, enabling robots to perform complex tasks autonomously.
• Drones: ANNs are applied to control drones for tasks such as navigation, aerial surveillance, and delivery
systems.

6. Gaming and Artificial Intelligence (AI)


• Game AI: Neural networks power AI agents that learn strategies and tactics by playing games. They have
been applied in games like chess, Go, and poker, with AI systems like AlphaGo achieving superhuman
performance.
• Virtual Reality (VR) and Augmented Reality (AR): ANNs are used to enhance the realism of VR/AR

UNIT 1 Page 21
• Virtual Reality (VR) and Augmented Reality (AR): ANNs are used to enhance the realism of VR/AR
environments by improving object recognition, scene reconstruction, and interactive user interfaces.

7. Recommendation Systems
• Content Recommendations: Platforms like Netflix, YouTube, and Spotify use ANNs to recommend
content based on user preferences and behavior by analyzing viewing, listening, or searching patterns.
• Product Recommendations: E-commerce sites like Amazon use neural networks to recommend products
by learning from users' purchase history, browsing behavior, and other user data.

8. Manufacturing and Industrial Automation


• Quality Control: ANNs are used to inspect products in real time for defects by analyzing images or sensor
data, ensuring that only high-quality items move forward in the production process.
• Predictive Maintenance: Neural networks help predict equipment failure by analyzing operational data,
enabling proactive maintenance and reducing downtime.
• Supply Chain Optimization: ANNs assist in forecasting demand, optimizing inventory levels, and
improving overall supply chain efficiency.

9. Energy and Environment


• Energy Consumption Prediction: Neural networks are used to predict energy consumption patterns,
helping utility companies manage electricity distribution and minimize energy waste.
• Renewable Energy Optimization: ANNs help in optimizing the integration of renewable energy sources
like solar and wind by predicting generation patterns and balancing demand-supply fluctuations.
• Climate Modeling: Neural networks are applied to model climate change, predict weather patterns, and
study environmental impacts by analyzing large datasets from various sources.

10. Marketing and Customer Segmentation


• Customer Segmentation: ANNs help companies segment their customers based on behavior,
preferences, and purchase history to target personalized marketing campaigns.
• Sales Forecasting: Neural networks predict future sales trends by analyzing historical data, seasonality,
and market factors, helping businesses make informed decisions about inventory and marketing
strategies.

UNIT 1 Page 22

You might also like