0% found this document useful (0 votes)
30 views26 pages

Lesson 3 Basics of Neural Networks - Perceptron

The document provides an overview of Perceptrons, the earliest artificial neural networks, detailing their structure, functionality, and limitations, particularly their inability to handle non-linear problems. It introduces Multilayer Perceptrons (MLPs) as an advancement that incorporates hidden layers and non-linear activation functions to model complex relationships. The document also discusses the advantages, limitations, and applications of MLPs in various fields such as image recognition and natural language processing.

Uploaded by

ngugivivy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views26 pages

Lesson 3 Basics of Neural Networks - Perceptron

The document provides an overview of Perceptrons, the earliest artificial neural networks, detailing their structure, functionality, and limitations, particularly their inability to handle non-linear problems. It introduces Multilayer Perceptrons (MLPs) as an advancement that incorporates hidden layers and non-linear activation functions to model complex relationships. The document also discusses the advantages, limitations, and applications of MLPs in various fields such as image recognition and natural language processing.

Uploaded by

ngugivivy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

SPC 2409 ARTIFICIAL

NEURAL NETWORKS

Lesson 3: Basics of Neural


Networks: Perceptron
Perceptron Overview
 The Perceptron is one of the earliest artificial neural
networks introduced by Frank Rosenblatt in 1958,
inspired by earlier work from Warren McCulloch and
Walter Pitts.
 While today we use other models of artificial neurons,
they follow the general principles set by the perceptron.
 It is a simple, single-layer, feedforward neural network
that can solve linearly separable problems, meaning it
can classify data that is linearly separable into two
classes (e.g., a simple binary classification problem).
Architecture

Model of an artificial neuron

 The figure depicts a neuron connected with n other


neurons and thus receives n inputs (x1, x2, ….. xn).
Structure of a Perceptron
 Input Layer: Receives the input data, which can consist of
multiple features.
 Weights: Each input feature is assigned a weight that adjusts its
importance.
 Summation Function: The weighted sum of inputs is
calculated.
 Activation Function: Applies a threshold function or step
function to the weighted sum to produce an output (0 or 1).
 Output Layer: Produces the final decision, often as a binary
classification (e.g., class 0 or 1).
Mathematical Representation:
 The mathematical representation of a perceptron in neural
networks can be broken down as follows:
 1. Input and Weights
2. Weighted Sum (Linear
Combination)
The perceptron computes a weighted sum of the inputs:
3. Activation Function
 The perceptron applies an activation function f(z) to the weighted sum
to produce the output y:

 In the original perceptron model, the activation function is a step


function:

 This function outputs a binary value (0 or 1), depending on whether z


crosses a threshold (typically 0).
Complete Representation
 The perceptron can be represented mathematically as:
Functionality of a Perceptron
 Binary Classification:
The perceptron divides the input space into two
classes based on the decision boundary it learns
during training.
 Learning Process:
The perceptron learns by adjusting weights through a
process called supervised learning, where the
output is compared to the expected result, and
weights are updated iteratively using methods like
gradient descent.
Example
 For a perceptron with:
Limitations of Perceptron
 Despite its simplicity, the perceptron has several limitations that
restrict its ability to handle complex problems. Below are the main
limitations:
Limitation 1: Linear Separability
 Problem: The perceptron can only solve problems that are linearly
separable.
 In other words, if the data is not linearly separable (e.g., XOR problem),
the perceptron will fail to classify the data correctly.
 Example:
 XOR Problem: A simple XOR function is not linearly separable, meaning
there is no straight line or hyperplane that can completely separate the
two classes.
 Perceptrons struggle with this due to their reliance on a single decision
boundary.
Limitation 2: Cannot Handle Complex Non-Linear
Relationships
 Problem: Perceptrons cannot handle problems that
exhibit non-linear relationships between inputs and
outputs.
 The single-layer architecture limits the complexity of decision
boundaries that can be drawn.
 Solution:
 Complex non-linear problems require multi-layer
networks (e.g., Multi-Layer Perceptrons or Deep
Learning models) that include hidden layers for
capturing non-linear features.
3. No Capability for Feature Interaction
 Problem: Perceptrons do not model complex feature
interactions effectively because they rely on a simple
linear combination of inputs.
 For example, in a dataset with multiple interacting features, perceptrons fail
to capture interactions between features, leading to suboptimal results.

4. Limited Generalization
 Problem: A perceptron may overfit to the training
data, meaning it performs well on training samples
but fails to generalize to unseen data.
 This is because it does not account for complex relationships, and if the
training data is noisy, it can overfit by learning noise instead of patterns.
6. Limited Learning Capacity
 Problem: The single-layer nature of the perceptron limits its
capacity to learn complex representations. More complex
architectures are required for handling large datasets and tasks
with rich feature spaces.

7. Training Convergence Issues


 Problem: In some cases, especially when data is not linearly
separable or when the training data is not sufficient, perceptrons
may fail to converge to a solution.
 This happens because the simple weight adjustments using gradient
descent are insufficient for learning a complex decision boundary.
8. Computational Limitations
 Problem: Perceptrons are computationally limited to
a small number of features.
 As the number of features increases, the complexity
of the network grows rapidly, making it inefficient for
high-dimensional datasets.
Advancements to Overcome Perceptron
Limitations

 Multi-Layer Perceptrons (MLPs):


Overcome the limitations by introducing hidden layers
and non-linear activation functions (e.g., ReLU,
Sigmoid) that allow modeling complex relationships.
 Deep Learning Architectures:
Enable handling of non-linear data and complex
feature interactions through deeper, more expressive
networks.
Multilayer Perceptrons (MLPs)
 Multilayer Perceptrons (MLPs) are a class of
feedforward artificial neural networks
composed of multiple layers, including an
input layer, one or more hidden layers, and an
output layer.
 Unlike single-layer perceptrons, MLPs can
model complex, non-linear relationships
between input and output by introducing
additional layers that help capture intricate
patterns in the data.
Structure of Multilayer
Perceptrons
An MLP consists of:
 Input Layer: Receives input features (e.g., features
of a dataset).
 Hidden Layers: One or more layers that process data
through weighted connections and non-linear
activation functions.
 Output Layer: Produces the final prediction or result
based on the processed data from hidden layers.
Key Components of MLP
 Weights: The learnable parameters that
adjust how inputs influence outputs.
 Bias: An additional parameter that helps in
shifting the output.
 Activation Functions: Introduce non-linearity
to the network, allowing it to capture complex
patterns (e.g., ReLU, Sigmoid, Tanh).
How MLPs Work
 Forward Propagation:
 Data flows through each layer.
 Each neuron in a layer applies weights, sums them, adds bias,
and applies the activation function.
 Loss Function:
 After obtaining the final output, a loss function (e.g., Mean
Squared Error or Cross-Entropy) compares the predicted
output with the true labels or values.
 Backpropagation:
 Errors are propagated backward through the network to
adjust weights using gradient descent.
Advantages of MLPs
 Flexibility: MLPs can solve a wide variety of
problems (classification, regression, time
series, etc.).
 Non-linearity: Ability to capture complex
patterns through the use of multiple hidden
layers and non-linear activation functions.
 Universal Approximation: MLPs can
approximate any function given sufficient
depth and data.
Limitations of MLPs
 Overfitting: With too many layers or neurons, MLPs
can easily overfit the training data, leading to poor
generalization to new data.
 Computational Complexity: As the number of
layers and neurons increases, the computational cost
of training becomes significantly high.
 Data Requirements: Needs a large amount of high-
quality data to perform well, which may not always be
feasible.
 Vanishing/Exploding Gradients: In deeper
networks, gradients can either vanish or explode,
causing training instability.
Applications of MLPs
 Image Recognition: Used in tasks like image
classification and object detection using Convolutional
Neural Networks (CNNs), which are extensions of MLPs
for image data.
 Natural Language Processing (NLP): For text
classification, sentiment analysis, machine translation,
and more.
 Predictive Modeling: In healthcare for diagnosing
diseases, predicting patient outcomes, etc.
 Financial Analysis: For fraud detection, stock market
prediction, and credit scoring.
Example Workflow
 Input: Feature vectors (e.g., sensor readings,
historical data).
 Processing: Pass through input, hidden, and output
layers.
 Prediction: Final prediction based on activation
functions and learned weights.
 Loss Function: Evaluate how close the prediction is
to the ground truth.
 Backpropagation: Adjust weights using gradient
descent based on the loss.
Comparison with Other
Architectures
 MLPs vs Convolutional Neural Networks
(CNNs): MLPs are used for tabular data or
tasks requiring dense representation, while
CNNs are more suited for image-related tasks.
 MLPs vs Recurrent Neural Networks
(RNNs): MLPs are feedforward and handle
static input, whereas RNNs are designed for
sequential data and time series.
Conclusion
 Multilayer Perceptrons are versatile neural
networks capable of solving complex tasks,
but they require proper regularization,
sufficient data, and computational resources
to perform optimally.
 Advanced variants and architectures like Deep
MLPs, Residual Networks (ResNets), and
Regularized MLPs have been developed to
address limitations and improve performance.

You might also like