0% found this document useful (0 votes)

7 views13 pages

Unit 4

The document discusses neural networks, support vector machines, and K-nearest neighbors, focusing on their applications in regression and classification tasks. It covers the structure and training of neural networks, including issues like overfitting and regularization techniques. Additionally, it introduces support vector machines and their ability to handle complex data distributions through transformations into higher-dimensional spaces.

Uploaded by

klnc prakash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views13 pages

Unit 4

Uploaded by

klnc prakash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

Unit - IV

Neural Networks (NN), Support Vector Machines (SVM), and K-nearest

Neighbor: Fitting neural networks, Perceptron learning algorithm, Back
propagation, Issues in training NN, SVM for classification, Reproducing
Kernels, SVM for regression, K-nearest Neighbour classifiers (Image Scene
Classification).

Fitting neural networks:

Neural networks refer to a broad category of models and techniques used in machine
learning. Originally inspired by the human brain, they are now widely recognized as
mathematical models for regression and classification tasks. The most basic type of neural
network is the "vanilla" neural network, also known as the single hidden layer back-
propagation network or single-layer perceptron. This model consists of one hidden layer and
is trained using backpropagation. neural networks are fundamentally just mathematical
models that transform input data. In essence, neural networks are nonlinear statistical models,
similar to Projection Pursuit Regression (PPR), which also utilizes nonlinear functions to
identify patterns in data.
A neural network is a two-stage regression or classification model, typically represented by a
network diagram as in Figure 11.2.
 Neural networks process data in two main stages:
1. Feature extraction – Creating new features from input data.
2. Prediction – Using transformed features to make a final decision.
 Figure 11.2 illustrates how data moves from input to output through different layers.
Neural networks can be used for regression (predicting continuous values) and classification
(categorizing data into classes).
 In regression, we predict a single numeric output (e.g., predicting house prices).
Since there is only one output value, K=1, meaning a single output neuron (Y 1).
Neural networks can also predict multiple continuous values (For example, a
network could predict both house price and rental income from the same input data).
 In classification, we predict one of K classes. The output layer has K neurons, where
each neuron predicts the probability of a class k. For classification, We represent each
class using one-hot encoding that is if there are K classes, the target variable Yk is a
vector with 1 for the correct class and 0 for others.
Mathematical Representation of a Neural Network:
Derived features (Zm) are created using weighted sums (linear combinations) of input
features, these features are then transformed and used to predict the target Yk.
Computing Hidden Layer Neurons:

Computing Output Neurons:

Applying Final Transformation:

Bias Term in Neural Networks:
The bias unit (constant 1) is added to each neuron in the hidden and output layers. This
allows the model to shift activation thresholds, improving performance.

Softmax Function for Classification:

Fitting Neural Networks:

A neural network has parameters (weights) that need to be learned from the training data.
These parameters decide how the network behaves. The goal is to find the best weights so the
network can predict correctly. We use a loss function to measure how wrong the network's
predictions are. Training the network means adjusting the weights to minimize this loss.
Let θ (theta) represents all the weights in the network. There are two sets of weights:
 Between input layer and hidden layer: {α0m, αm}
 Between hidden layer and output layer: {β0k, βk}
Total number of weights depends on the number of input features, hidden units, and output
classes.
5. Regularization (Prevent Overfitting)
 If we just minimize the loss, the network might memorize the training data
(overfitting).
 To avoid this, we add regularization (penalties for too large weights) or stop training
early (early stopping).
6. Training with Gradient Descent (Backpropagation)
 The common way to train the network is by gradient descent.
 The process of calculating gradients and updating the weights is called
backpropagation.
7. Forward and Backward Pass
 In backpropagation, we:
o Forward pass: Compute predictions from inputs using the current weights.

o Backward pass: Compute how much each weight contributed to the error
(gradients).
o Update weights using these gradients.
Some Issues in Training Neural Networks:
11.5.1 Starting Values:
 When training a neural network, we need to initialize the weights (the connections
between neurons).
Key Points
1. Start with small random values near zero.
o This helps the network start off behaving almost like a linear model (simple
straight-line model).
o This makes training smoother and more stable.

2. Avoid starting with exactly zero weights.

o All neurons would behave exactly the same ("perfect symmetry").

o The network won’t learn properly because all weight updates would be the
same (zero derivatives = no movement).
3. Avoid starting with very large weights.
o Large weights make the network highly nonlinear and chaotic right from the
start.
o This often makes training unstable and can lead to poor solutions.

Overfitting:
 Overfitting happens when a neural network learns the training data too well,
including noise and unnecessary details.
 This causes the network to perform poorly on new (unseen) data.

🔹 Neural Networks Overfit because,

 Neural networks often have a lot of weights (parameters).

 More weights = more flexibility, so the network can memorize the training data
instead of learning general patterns.
 Overfitting typically happens when training for too long.

🔹 Prevent Overfitting in 2 ways they are,

1️. Early Stopping: Stop training before the network reaches the global minimum (the
lowest possible error).
 Why it works:
o At the start, the weights are small, and the model behaves like a simple linear
model.
o Early stopping prevents the model from becoming too complex, keeping it
closer to that simple starting point.
 How to do it:
o Use a validation dataset.

o Stop training when the validation error starts increasing, even if the
training error keeps decreasing.

2️. Weight Decay (Regularization)

 What it is: Add a penalty term to the loss function that discourages large weights.
 This is similar to ridge regression for linear models.
 The new loss function looks like:

Where,

Other forms for the penalty,

 Effect: Shrinks the weights towards zero, making the network simpler and less likely
to overfit.
 How to choose λ\: Use cross-validation to find the best value.
🔹 Two Types of Weight Penalties

Type Explanation

Standard
Penalty is the sum of squared weights (like ridge regression).
Weight Decay

Weight Shrinks small weights more than large weights, encouraging simpler
Elimination networks.

🔹 Visual Example (Figures 11.4 & 11.5)

 Without weight decay:

o Predictions are disordered and overfit the data.

 With weight decay:

o Predictions are smoother and more general (better at predicting new data).

 Hinton diagrams (grayscale heat maps):

o Show how much each weight contributes.

o With weight decay, the weights are more balanced across all hidden units.

11.5.4 Number of Hidden Units and Layers:

The number of hidden units and layers in a neural network affects its ability to learn patterns
from data. Too few hidden units may cause underfitting, where the network is too simple to
capture complex relationships. Too many hidden units can lead to overfitting, but this can
be controlled using regularization techniques like weight decay, which shrinks unnecessary
weights toward zero. Typically, networks use between 5 and 100 hidden units, with more
units for larger datasets. The number of hidden layers controls how many levels of features
the network can learn. Each layer extracts features, and deeper networks can capture more
complex, hierarchical patterns. In practice, starting with enough hidden units and using
regularization often works better than trying to fine-tune the exact number of units with
cross-validation.
11.5.5 Multiple Minima: Neural networks often have many local minima because their error
function is nonconvex, meaning there are many possible low-error solutions. The final result
depends heavily on the initial weights chosen at the start of training. To handle this, the 3
Solutions are,
1️.Try Multiple Starting Points
 Train the network several times, each time starting with a different random
initialization of weights.
 Pick the model that has the lowest (penalized) error.
2️. Averaging Predictions (Ensembling)

 A better approach is to average the predictions from all the networks you trained.
 This works better than averaging the actual weights, because neural networks are
nonlinear — averaging weights could confuse the whole model.
3️. Bagging (Bootstrap Aggregating)

 Another option is bagging, where:

o Train several networks on slightly different (randomly perturbed) versions
of the training data and average the predictions of all those networks.
 Bagging helps improve stability and reduces overfitting.

Support Vector Machines:

In many real-world cases, classes overlap, making perfect separation impossible. Support
Vector Machines (SVMs) address this by transforming data into a higher-dimensional space,
where a linear boundary can separate classes more effectively, allowing for nonlinear
decision boundaries.
Another set of techniques builds on Fisher’s Linear Discriminant Analysis (LDA), which is
used to find the best way to separate different groups of data. Some important generalizations
of LDA include:
 Flexible Discriminant Analysis (FDA): Similar to SVMs, this method allows for
nonlinear decision boundaries, making it more adaptable to complex data.
 Penalized Discriminant Analysis (PDA): Useful in cases like image or signal
classification, where there are many features that are highly correlated, helping to
avoid overfitting.
 Mixture Discriminant Analysis (MDA): Helps classify irregularly shaped data
distributions, which don’t fit neatly into simple geometric boundaries.
12.2 The Support Vector Classifier:
https://chatgpt.com/c/67ceea55-2ec0-800e-adc6-20f3bfef1867

How To Build Your Own Neural Network From Scratch in
No ratings yet
How To Build Your Own Neural Network From Scratch in
6 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
26 pages
Ai - W7L13
No ratings yet
Ai - W7L13
46 pages
A Imprimer 4
No ratings yet
A Imprimer 4
4 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
ANN Doc
No ratings yet
ANN Doc
2 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
Neural Networks Essay Feranmi Dere
No ratings yet
Neural Networks Essay Feranmi Dere
7 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
Neural
No ratings yet
Neural
53 pages
Main
No ratings yet
Main
25 pages
Unit III
No ratings yet
Unit III
37 pages
Neural Network
100% (1)
Neural Network
54 pages
UNIT 3 - Backpropagation Algorithm
No ratings yet
UNIT 3 - Backpropagation Algorithm
38 pages
Lecture W15ab
No ratings yet
Lecture W15ab
44 pages
Deep Neural Network
No ratings yet
Deep Neural Network
60 pages
ANN-Implemetation of Back-Prop
No ratings yet
ANN-Implemetation of Back-Prop
89 pages
Classification BP Regression KNN Other Classifiers - Final
No ratings yet
Classification BP Regression KNN Other Classifiers - Final
116 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
NN Introduction MES
No ratings yet
NN Introduction MES
39 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Chap 2 Training Feed Forward Neural Networks
No ratings yet
Chap 2 Training Feed Forward Neural Networks
22 pages
NN 2
No ratings yet
NN 2
12 pages
3ML.05.NeuralNetworks DeepLearning
No ratings yet
3ML.05.NeuralNetworks DeepLearning
67 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
General Observation
No ratings yet
General Observation
93 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
NN Concepts
No ratings yet
NN Concepts
4 pages
ML Unit 4
No ratings yet
ML Unit 4
23 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
36 pages
Deep Learning PDF
100% (1)
Deep Learning PDF
87 pages
Lect 12 - Deep Feed Forward NN - Review
No ratings yet
Lect 12 - Deep Feed Forward NN - Review
93 pages
Module 3 - Modified
No ratings yet
Module 3 - Modified
106 pages
cst414 - Deep Learning
No ratings yet
cst414 - Deep Learning
34 pages
Artificial Neural Network Using R
No ratings yet
Artificial Neural Network Using R
15 pages
Lecture 10
No ratings yet
Lecture 10
155 pages
Artificial Neural Networks - DL
No ratings yet
Artificial Neural Networks - DL
55 pages
Neural - Networks
No ratings yet
Neural - Networks
47 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
3rd Ass
No ratings yet
3rd Ass
6 pages
6 - Tips For Training Deep Neural Networks
No ratings yet
6 - Tips For Training Deep Neural Networks
59 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
18 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
CS 329 Lecture4 2025new
No ratings yet
CS 329 Lecture4 2025new
61 pages
UNIT-2 Foundations of Deep Learning
No ratings yet
UNIT-2 Foundations of Deep Learning
64 pages
Unit 4
No ratings yet
Unit 4
38 pages
Artificial Neural Networks - Lect - 3
No ratings yet
Artificial Neural Networks - Lect - 3
16 pages
Back Propagation
No ratings yet
Back Propagation
29 pages
Artificial Neural NetworkIV
No ratings yet
Artificial Neural NetworkIV
6 pages
Weights and Biases
No ratings yet
Weights and Biases
10 pages
Chapters 1-4
No ratings yet
Chapters 1-4
6 pages
ML Unit - 2
No ratings yet
ML Unit - 2
70 pages
DL Mod2
No ratings yet
DL Mod2
45 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
ML Lec 10 Neural Networks
No ratings yet
ML Lec 10 Neural Networks
87 pages
Types of MAC Protocols
No ratings yet
Types of MAC Protocols
16 pages
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
From Everand
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
Fouad Sabry
No ratings yet
Artificial Neural Networks: Fundamentals and Applications for Decoding the Mysteries of Neural Computation
From Everand
Artificial Neural Networks: Fundamentals and Applications for Decoding the Mysteries of Neural Computation
Fouad Sabry
No ratings yet
DeepLearning L1 Intro
No ratings yet
DeepLearning L1 Intro
92 pages
Perceptron in Machine Learning
No ratings yet
Perceptron in Machine Learning
11 pages
4-1 Syllabus
No ratings yet
4-1 Syllabus
10 pages
Artificial Intelligence Tutorial 5 - Answers: Difficult), and P
100% (1)
Artificial Intelligence Tutorial 5 - Answers: Difficult), and P
5 pages
On The Relation of Impulse Propagation To Synaptic Strength: Preprint. Work in Progress
No ratings yet
On The Relation of Impulse Propagation To Synaptic Strength: Preprint. Work in Progress
9 pages
Machine Learning - AL3451 - Important Questions With Answer
No ratings yet
Machine Learning - AL3451 - Important Questions With Answer
27 pages
Time Series Forecasting With Multilayer Perceptrons and Elmen Neural Neworks
No ratings yet
Time Series Forecasting With Multilayer Perceptrons and Elmen Neural Neworks
5 pages
Multilayer Perceptron
No ratings yet
Multilayer Perceptron
5 pages
Tutorial Sheet 2 (KOE073)
No ratings yet
Tutorial Sheet 2 (KOE073)
3 pages
An Improved Optimization Method in Gas Allocation For Continuous Flow Gas-Lift System
No ratings yet
An Improved Optimization Method in Gas Allocation For Continuous Flow Gas-Lift System
12 pages
Data Pruning
No ratings yet
Data Pruning
52 pages
Topic 07-Part1 Introduction To Deep Neural Networks
No ratings yet
Topic 07-Part1 Introduction To Deep Neural Networks
27 pages
DL
No ratings yet
DL
9 pages
Unit 5
No ratings yet
Unit 5
59 pages
M.E Computer & Commn.
No ratings yet
M.E Computer & Commn.
27 pages
Sample Questions
No ratings yet
Sample Questions
4 pages
Chapter 6
No ratings yet
Chapter 6
172 pages
Sinngle Layer Perceptron1
No ratings yet
Sinngle Layer Perceptron1
28 pages
Simplifying Graph Convolutional Networks
No ratings yet
Simplifying Graph Convolutional Networks
14 pages
Digital Manufacturing and Assembly Systems in Industry 4.0 1st Edition Kaushik Kumar (Editor)
100% (3)
Digital Manufacturing and Assembly Systems in Industry 4.0 1st Edition Kaushik Kumar (Editor)
52 pages
Asset-V1 MITx+6.86x+3T2020+typeasset+blockslides Lecture6 Compressed
No ratings yet
Asset-V1 MITx+6.86x+3T2020+typeasset+blockslides Lecture6 Compressed
22 pages
Soft Computing Roadmap
No ratings yet
Soft Computing Roadmap
3 pages
Chapter 7 - Neural-Networks
100% (1)
Chapter 7 - Neural-Networks
60 pages
HW 23 P 4 Rie
No ratings yet
HW 23 P 4 Rie
5 pages
COS4852 2024 Assignment 2
No ratings yet
COS4852 2024 Assignment 2
16 pages
Module 10 Trends No Activity
No ratings yet
Module 10 Trends No Activity
3 pages
00-Introduction DNN
No ratings yet
00-Introduction DNN
32 pages
Speed Control of Induction Motor Using Ann PDF Free
No ratings yet
Speed Control of Induction Motor Using Ann PDF Free
67 pages
Sharpening The Blade Missing Data Imputation Using Supervised Machine Learning
No ratings yet
Sharpening The Blade Missing Data Imputation Using Supervised Machine Learning
24 pages
Artificial Neural Networks: References
No ratings yet
Artificial Neural Networks: References
57 pages

Unit 4

Uploaded by

Unit 4

Uploaded by

Unit - IV

Neural Networks (NN), Support Vector Machines (SVM), and K-nearest

Fitting neural networks:

Computing Output Neurons:

Applying Final Transformation:

Softmax Function for Classification:

Fitting Neural Networks:

2. Avoid starting with exactly zero weights.

🔹 Neural Networks Overfit because,

 Neural networks often have a lot of weights (parameters).

🔹 Prevent Overfitting in 2 ways they are,

2️. Weight Decay (Regularization)

Other forms for the penalty,

🔹 Visual Example (Figures 11.4 & 11.5)

 Without weight decay:

 With weight decay:

 Hinton diagrams (grayscale heat maps):

11.5.4 Number of Hidden Units and Layers:

 Another option is bagging, where:

Support Vector Machines:

You might also like