0% found this document useful (0 votes)
26 views23 pages

Unit 3 Self Made

Uploaded by

ginni bhayana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views23 pages

Unit 3 Self Made

Uploaded by

ginni bhayana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Unit 3

Artificial Neural Networks


1. Biological Inspiration:
 ANNs are inspired by the structure and functioning of the human brain,
consisting of neurons (nodes) and synapses (connections).
 Each node represents a computational unit that processes input signals and
produces outputs.

2. Architecture:
 An ANN typically consists of three types of layers:
1. Input Layer: Receives the input features.
2. Hidden Layers: Perform intermediate computations, where the
"learning" happens.
3. Output Layer: Produces the final prediction or classification result.
 The number of layers and nodes defines the network's complexity.
3. Learning Process:
 ANNs use weights and biases associated with connections to learn patterns.
 The learning process involves:
1. Forward Propagation: Data flows through the network to make
predictions.
2. Backward Propagation (Backpropagation): Errors are propagated
backward to adjust weights using optimization algorithms like Gradient
Descent.

4. Activation Functions:
 Non-linear activation functions enable ANNs to model complex relationships.
 Common activation functions:
o Sigmoid
o ReLU (Rectified Linear Unit)
o Tanh
o Softmax (for classification tasks)

5. Applications:
 ANNs are widely used across industries for tasks like:
o Image recognition (e.g., face detection).
o Natural language processing (e.g., sentiment analysis).
o Time series forecasting (e.g., stock price prediction).
o Autonomous vehicles (e.g., driving decisions).

6. Advantages and Challenges:


 Advantages:
o Ability to learn complex patterns and relationships.
o Adaptability to various tasks.
 Challenges:
o Requires large datasets and significant computational resources.
o Susceptible to overfitting if not regularized properly.
o Difficult to interpret (acts as a "black box").
HebbNet refers to a neural network model that applies Hebbian Learning, a theory
of learning inspired by the way biological neurons adjust their connections. This
learning mechanism was proposed by Donald Hebb in 1949, famously summarized
as:
"Cells that fire together, wire together."
Key Principles of HebbNet
1. Hebbian Learning Rule:
o The strength of the connection (synapse) between two neurons
increases if they are activated simultaneously.

This rule re inforces connections between correlated neurons.

Structure of HebbNet
 Input Layer: Accepts the input features.
 Output Layer: Produces a response, often a pattern or a class label.
 Weights: Initialized randomly and updated according to Hebbian Learning.
Unlike traditional neural networks, HebbNet does not rely on error-based learning
methods like backpropagation but instead updates weights directly based on the
co-activation of neurons.
Algorithm for HebbNet
1. Initialize Weights:
o Start with small random weights between neurons.

4. Repeat:
Continue presenting patterns and updating weights until the network
stabilizes or reaches a stopping criterion.

Applications of HebbNet
1. Pattern Association:
o Learning to associate one pattern with another, such as mapping an
input vector to a target vector.
2. Unsupervised Learning:
o Used in scenarios where labeled data is unavailable, and the network
learns correlations in the input data.
3. Feature Extraction:
o Can identify dominant patterns in data, similar to Principal Component
Analysis (PCA).
4. Biological Modeling:
o Helps simulate brain-like learning for understanding neural processes
What is a Perceptron?

A Perceptron is one of the simplest types of artificial neural networks, It is a type


of supervised learning algorithm for binary classification tasks. The perceptron is
designed to learn a linear decision boundary that separates data points into two
classes.
What is ADALINE?
 ADALINE (Adaptive Linear Neuron) is a supervised learning algorithm used
for binary classification and regression tasks. It uses linear activation during
training.
 The model computes a weighted sum of inputs and bias, producing a net input
(zzz), which is used to calculate the output.
 Unlike perceptron, ADALINE minimizes the Mean Squared Error (MSE)
between the predicted and actual outputs during training.
 The weights are updated using gradient descent, allowing the model to
iteratively reduce error and improve accuracy.
 ADALINE can only solve linearly separable problems and forms the basis for
advanced neural network models like multi-layer perceptrons.

Multilayer Neural Network (MLN) in Machine Learning


A Multilayer Neural Network (MLN) is an advanced type of artificial neural
network that consists of multiple layers of neurons, enabling it to model complex,
non-linear relationships in data. It is also referred to as a Multilayer Perceptron
(MLP) when used for supervised learning tasks.

Key Characteristics of Multilayer Neural Networks


1. Layers:
o Input Layer: Accepts the raw input features.
o Hidden Layers: Intermediate layers where computation and feature
extraction happen.
o Output Layer: Produces the final prediction or classification result.
2. Neurons:
o Each neuron in a layer is connected to neurons in the next layer,
forming a network of weights.
3. Activation Functions:
o Non-linear activation functions (e.g., ReLU, Sigmoid, Tanh) allow the
network to model non-linear relationships.
4. Learning Rule:
o Weights are updated using backpropagation and optimization
algorithms like Gradient Descent or Adam.
5. Supervised Learning:
o MLNs are primarily trained on labeled data for tasks such as
classification, regression, and forecasting.
Activation Functions
Activation functions are mathematical functions that determine whether a neuron
should be activated or not based on its input. They introduce non-linearity into a
neural network, enabling it to learn complex patterns beyond linear relationships.
The choice of activation function impacts:
 Model's learning capability: Different functions help learn specific patterns.
 Convergence speed: Non-linearities affect gradient flow during
backpropagation.
 Stability of gradients: Functions like ReLU mitigate vanishing gradient
problems.
Types of Activation Functions
1. Linear Activation Function
o Theory: The output is directly proportional to the input (f(x)=cxf(x) =
cxf(x)=cx), providing no non-linearity. While simple, it limits the
network to learning only linear mappings.
o Use: Rarely used in hidden layers but applicable in output layers for
regression.
2. Sigmoid
o Theory: Maps any real-valued number to a range between 0 and 1. It is
suitable for probabilistic interpretation in binary classification but
suffers from the vanishing gradient problem.
o Use: Binary classification or as a gating mechanism in LSTMs.
3. Tanh (Hyperbolic Tangent)
o Theory: Scales input to a range between -1 and 1. The output is zero-
centered, aiding convergence, but it can still suffer from the vanishing
gradient problem.
o Use: Hidden layers of feedforward networks.
4. ReLU (Rectified Linear Unit)
o Theory: Outputs the input directly if positive; otherwise, it outputs
zero. It is computationally efficient and mitigates vanishing gradients
but can suffer from the dying ReLU problem.
o Use: Most common in deep networks.
5. Softmax
o Theory: Converts logits into probabilities, ensuring that their sum
equals 1. Suitable for multi-class classification tasks.
o Use: Output layer of multi-class classifiers.

Theory and Types of Loss Functions


Theory of Loss Functions
Loss functions quantify the discrepancy between the predicted outputs and actual
target values. They serve as the optimization objective, providing a scalar value
that guides the learning process. A well-chosen loss function ensures better
generalization and faster convergence.
Types of Loss Functions
1. Regression Loss Functions
o Mean Squared Error (MSE): Measures the average squared difference
between predictions and actual values. Sensitive to outliers.
o Mean Absolute Error (MAE): Computes the average absolute
difference. Robust to outliers but less smooth for gradient descent.
o Huber Loss: Combines MSE and MAE to balance sensitivity to outliers.
2. Classification Loss Functions
o Binary Cross-Entropy: Measures the divergence between predicted
probabilities and actual binary labels.
o Categorical Cross-Entropy: Extends binary cross-entropy for multi-class
classification, comparing predicted probabilities to true class
distributions.
o Hinge Loss: Encourages a margin between the decision boundary and
correct classifications, commonly used with SVMs.

Hyperparameters in Machine Learning


Hyperparameters are special settings in a machine learning model that you decide
before training it. These settings control how the model learns and affect its
accuracy and speed. Unlike the model's internal parameters (like weights),
hyperparameters are not learned during training — you have to set them yourself.

Why are Hyperparameters Important?


1. Improve Accuracy: Correct hyperparameters can make your model perform
better on your data.
2. Control Training Speed: Some hyperparameters, like the learning rate, can
make the model train faster or slower.
3. Avoid Overfitting or Underfitting: They help balance the model so it doesn’t
memorize the training data (overfitting) or fail to learn enough (underfitting).

Types of Hyperparameters
1. Model Hyperparameters
These decide the structure or shape of the model.
Examples:
o Number of layers: In a neural network, how many layers are there?
o Tree depth: For decision trees, how many splits can it make?
2. Training Hyperparameters
These control the learning process during training.
Examples:
o Learning Rate: How big are the steps the model takes while learning?
o Batch Size: How many data samples are used in each training step?
o Number of Epochs: How many times does the model go through the
entire dataset?
3. Regularization Hyperparameters
These prevent the model from overfitting (memorizing the data).
Examples:
o Dropout: Temporarily turns off some parts of the model during training
to make it more general.
o L1/L2 Regularization: Adds a penalty to large model weights, forcing
the model to simplify.
4. Optimization Hyperparameters
These control how optimization algorithms (like gradient descent) work.
Examples:
o Momentum: Helps the model move faster in the right direction during
training.
o Learning Rate Decay: Reduces the learning rate as training progresses.
Gradient Descent
Gradient Descent is defined as one of the most commonly used iterative
optimization algorithms of machine learning to train the machine learning and
deep learning models. It helps in finding the local minimum of a function.
The best way to define the local minimum or local maximum of a function using
gradient descent is as follows:
o If we move towards a negative gradient or away from the gradient of the
function at the current point, it will give the local minimum of that function.
o Whenever we move towards a positive gradient or towards the gradient of
the function at the current point, we will get the local maximum of that
function.

Based on the error in various training models, the Gradient Descent learning
algorithm can be divided into Batch gradient descent, stochastic gradient descent,
and mini-batch gradient descent. Let's understand these different types of
gradient descent:
1. Batch Gradient Descent:
Batch gradient descent (BGD) is used to find the error for each point in the training
set and update the model after evaluating all training examples. This procedure is
known as the training epoch. In simple words, it is a greedy approach where we
have to sum over all examples for each update.
Advantages of Batch gradient descent:
o It produces less noise in comparison to other gradient descent.
o It produces stable gradient descent convergence.
o It is Computationally efficient as all resources are used for all training
samples.
2. Stochastic gradient descent
Stochastic gradient descent (SGD) is a type of gradient descent that runs one
training example per iteration. Or in other words, it processes a training epoch for
each example within a dataset and updates each training example's parameters
one at a time. As it requires only one training example at a time, hence it is easier
to store in allocated memory. However, it shows some computational efficiency
losses in comparison to batch gradient systems as it shows frequent updates that
require more detail and speed. Further, due to frequent updates, it is also treated
as a noisy gradient. However, sometimes it can be helpful in finding the global
minimum and also escaping the local minimum.
Advantages of Stochastic gradient descent:
In Stochastic gradient descent (SGD), learning happens on every example, and it
consists of a few advantages over other gradient descent.
o It is easier to allocate in desired memory.
o It is relatively fast to compute than batch gradient descent.
o It is more efficient for large datasets.
3. MiniBatch Gradient Descent:
Mini Batch gradient descent is the combination of both batch gradient descent and
stochastic gradient descent. It divides the training datasets into small batch sizes
then performs the updates on those batches separately. Splitting training datasets
into smaller batches make a balance to maintain the computational efficiency of
batch gradient descent and speed of stochastic gradient descent. Hence, we can
achieve a special type of gradient descent with higher computational efficiency
and less noisy gradient descent.
Advantages of Mini Batch gradient descent:
o It is easier to fit in allocated memory.
o It is computationally efficient.
o It produces stable gradient descent convergence.
Backpropagation is a supervised learning algorithm used to train neural networks
by optimizing their weights and biases. It minimizes the error between predicted
and actual outputs by propagating the error backward through the network.
 Error Gradient Calculation: Systematically computes how each network
weight contributes to the overall prediction error, using the chain rule of calculus to
determine precise weight adjustments.
 Backward Learning Mechanism: Moves from output layer to input layer,
distributing error gradients and updating weights to improve future predictions by
understanding each neuron's error contribution.
 Automated Weight Optimization: Automatically adjusts network weights by
computing partial derivatives, allowing the neural network to learn complex
patterns and reduce prediction errors across multiple layers.
 Gradient Descent Implementation: Uses an iterative approach to minimize the
loss function by incrementally updating weights in the direction that reduces
network error.
 Computational Efficiency: Enables efficient learning in deep neural networks
by calculating gradients for all weights in a single backward pass, making it
scalable for complex machine learning tasks.

Steps in Backpropagation
1. Forward Pass:
o Input data flows through the network.
o Compute the output of each neuron layer by layer.
o Calculate the final output and loss (using a loss function like Mean
Squared Error or Cross-Entropy).
2. Backward Pass (Error Propagation):
o Compute the gradient of the loss function with respect to the output.
o Propagate the error back through the layers using the chain rule of
differentiation.
o Compute the gradients for weights and biases layer by layer.
3. Weight and Bias Updates:
o Update parameters (weights and biases) using the Gradient Descent
formula:
Weightnew=Weightcurrent−Learning Rate×Gradient\text{Weight}_{\tex
t{new}} = \text{Weight}_{\text{current}} - \text{Learning Rate} \times
\text{Gradient}Weightnew=Weightcurrent−Learning Rate×Gradient
4. Repeat:
o Perform forward and backward passes iteratively for multiple epochs
until the network converges.

Variants of Backpropagation
1. Standard Backpropagation:
o The traditional method where weights and biases are updated after
computing gradients.
2. Stochastic Backpropagation:
o Updates parameters after every individual data point (like Stochastic
Gradient Descent).
3. Mini-Batch Backpropagation:
o Divides the data into mini-batches and updates weights after processing
each mini-batch.
4. Online Backpropagation:
o Performs immediate updates to weights after processing each data
point.
5. Backpropagation Through Time (BPTT):
o Used in Recurrent Neural Networks (RNNs) to propagate errors through
time steps for sequential data.

Applications:
 Training neural networks for tasks like image recognition, language
translation, and recommendation systems.
 Widely used in deep learning for optimizing complex architectures like CNNs
and RNNs.
Avoiding Overfitting Through Regularization
Overfitting occurs when a model learns noise and specific details from the training
data, causing poor performance on unseen data. Regularization techniques help
prevent overfitting by simplifying the model or introducing constraints.
Common Regularization Techniques:
1. L1 Regularization (Lasso):
o Adds the absolute value of weights (∣w∣|w|∣w∣) as a penalty to the loss
function.
o Encourages sparsity, making some weights zero and simplifying the
model.
2. L2 Regularization (Ridge):
o Adds the squared value of weights (w2w^2w2) as a penalty to the loss
function.
o Helps reduce the magnitude of weights, leading to a smoother model.
3. Dropout:
o Randomly drops neurons (along with their connections) during training.
o Prevents over-reliance on specific neurons and improves generalization.
4. Early Stopping:
o Monitors the validation loss during training.
o Stops training when the validation loss stops decreasing, preventing
overfitting to the training data.
5. Data Augmentation:
o Expands the training dataset by applying transformations (e.g.,
rotations, flips, noise).
o Helps the model generalize better by learning from varied examples.
6. Batch Normalization:
o Normalizes the inputs of each layer to have zero mean and unit
variance.
o Reduces internal covariate shift and acts as a form of regularization.

Applications of Neural Networks


1. Image and Object Recognition: Neural networks excel in computer vision
tasks, enabling advanced image classification, facial recognition, and object
detection in fields like medical imaging, security systems, and autonomous
vehicles.
2. Natural Language Processing: Powerful neural network architectures like
transformers drive language translation, sentiment analysis, chatbots, speech
recognition, and advanced text generation technologies.
3. Financial Prediction and Analysis: Used in stock market forecasting, fraud
detection, credit scoring, and algorithmic trading by identifying complex
patterns in financial data and making predictive assessments.
4. Healthcare Diagnostics: Neural networks analyze medical images, predict
disease progression, assist in early diagnosis of conditions like cancer, and
support personalized treatment recommendations by processing complex
medical data.
5. Robotics and Autonomous Systems: Enable intelligent decision-making in
robots, self-driving cars, and automated systems by processing sensory
inputs, learning from environmental data, and making real-time adaptive
decisions.
6. Recommendation Systems: Power personalized recommendation engines in
platforms like Netflix, Amazon, and Spotify by analyzing user behavior,
predicting preferences, and suggesting tailored content or products based on
intricate pattern recognition.

You might also like