0% found this document useful (0 votes)
7 views13 pages

Unit 4

The document discusses neural networks, support vector machines, and K-nearest neighbors, focusing on their applications in regression and classification tasks. It covers the structure and training of neural networks, including issues like overfitting and regularization techniques. Additionally, it introduces support vector machines and their ability to handle complex data distributions through transformations into higher-dimensional spaces.

Uploaded by

klnc prakash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views13 pages

Unit 4

The document discusses neural networks, support vector machines, and K-nearest neighbors, focusing on their applications in regression and classification tasks. It covers the structure and training of neural networks, including issues like overfitting and regularization techniques. Additionally, it introduces support vector machines and their ability to handle complex data distributions through transformations into higher-dimensional spaces.

Uploaded by

klnc prakash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Unit - IV

Neural Networks (NN), Support Vector Machines (SVM), and K-nearest


Neighbor: Fitting neural networks, Perceptron learning algorithm, Back
propagation, Issues in training NN, SVM for classification, Reproducing
Kernels, SVM for regression, K-nearest Neighbour classifiers (Image Scene
Classification).

Fitting neural networks:

Neural networks refer to a broad category of models and techniques used in machine
learning. Originally inspired by the human brain, they are now widely recognized as
mathematical models for regression and classification tasks. The most basic type of neural
network is the "vanilla" neural network, also known as the single hidden layer back-
propagation network or single-layer perceptron. This model consists of one hidden layer and
is trained using backpropagation. neural networks are fundamentally just mathematical
models that transform input data. In essence, neural networks are nonlinear statistical models,
similar to Projection Pursuit Regression (PPR), which also utilizes nonlinear functions to
identify patterns in data.
A neural network is a two-stage regression or classification model, typically represented by a
network diagram as in Figure 11.2.
 Neural networks process data in two main stages:
1. Feature extraction – Creating new features from input data.
2. Prediction – Using transformed features to make a final decision.
 Figure 11.2 illustrates how data moves from input to output through different layers.
Neural networks can be used for regression (predicting continuous values) and classification
(categorizing data into classes).
 In regression, we predict a single numeric output (e.g., predicting house prices).
Since there is only one output value, K=1, meaning a single output neuron (Y 1).
Neural networks can also predict multiple continuous values (For example, a
network could predict both house price and rental income from the same input data).
 In classification, we predict one of K classes. The output layer has K neurons, where
each neuron predicts the probability of a class k. For classification, We represent each
class using one-hot encoding that is if there are K classes, the target variable Yk is a
vector with 1 for the correct class and 0 for others.
Mathematical Representation of a Neural Network:
Derived features (Zm) are created using weighted sums (linear combinations) of input
features, these features are then transformed and used to predict the target Yk.
Computing Hidden Layer Neurons:

Computing Output Neurons:

Applying Final Transformation:


Bias Term in Neural Networks:
The bias unit (constant 1) is added to each neuron in the hidden and output layers. This
allows the model to shift activation thresholds, improving performance.

Softmax Function for Classification:

Fitting Neural Networks:


A neural network has parameters (weights) that need to be learned from the training data.
These parameters decide how the network behaves. The goal is to find the best weights so the
network can predict correctly. We use a loss function to measure how wrong the network's
predictions are. Training the network means adjusting the weights to minimize this loss.
Let θ (theta) represents all the weights in the network. There are two sets of weights:
 Between input layer and hidden layer: {α0m, αm}
 Between hidden layer and output layer: {β0k, βk}
Total number of weights depends on the number of input features, hidden units, and output
classes.
5. Regularization (Prevent Overfitting)
 If we just minimize the loss, the network might memorize the training data
(overfitting).
 To avoid this, we add regularization (penalties for too large weights) or stop training
early (early stopping).
6. Training with Gradient Descent (Backpropagation)
 The common way to train the network is by gradient descent.
 The process of calculating gradients and updating the weights is called
backpropagation.
7. Forward and Backward Pass
 In backpropagation, we:
o Forward pass: Compute predictions from inputs using the current weights.

o Backward pass: Compute how much each weight contributed to the error
(gradients).
o Update weights using these gradients.
Some Issues in Training Neural Networks:
11.5.1 Starting Values:
 When training a neural network, we need to initialize the weights (the connections
between neurons).
Key Points
1. Start with small random values near zero.
o This helps the network start off behaving almost like a linear model (simple
straight-line model).
o This makes training smoother and more stable.

2. Avoid starting with exactly zero weights.


o All neurons would behave exactly the same ("perfect symmetry").

o The network won’t learn properly because all weight updates would be the
same (zero derivatives = no movement).
3. Avoid starting with very large weights.
o Large weights make the network highly nonlinear and chaotic right from the
start.
o This often makes training unstable and can lead to poor solutions.

Overfitting:
 Overfitting happens when a neural network learns the training data too well,
including noise and unnecessary details.
 This causes the network to perform poorly on new (unseen) data.

🔹 Neural Networks Overfit because,

 Neural networks often have a lot of weights (parameters).


 More weights = more flexibility, so the network can memorize the training data
instead of learning general patterns.
 Overfitting typically happens when training for too long.

🔹 Prevent Overfitting in 2 ways they are,


1️. Early Stopping: Stop training before the network reaches the global minimum (the
lowest possible error).
 Why it works:
o At the start, the weights are small, and the model behaves like a simple linear
model.
o Early stopping prevents the model from becoming too complex, keeping it
closer to that simple starting point.
 How to do it:
o Use a validation dataset.

o Stop training when the validation error starts increasing, even if the
training error keeps decreasing.

2️. Weight Decay (Regularization)

 What it is: Add a penalty term to the loss function that discourages large weights.
 This is similar to ridge regression for linear models.
 The new loss function looks like:

Where,

Other forms for the penalty,

 Effect: Shrinks the weights towards zero, making the network simpler and less likely
to overfit.
 How to choose λ\: Use cross-validation to find the best value.
🔹 Two Types of Weight Penalties

Type Explanation

Standard
Penalty is the sum of squared weights (like ridge regression).
Weight Decay

Weight Shrinks small weights more than large weights, encouraging simpler
Elimination networks.

🔹 Visual Example (Figures 11.4 & 11.5)

 Without weight decay:


o Predictions are disordered and overfit the data.

 With weight decay:


o Predictions are smoother and more general (better at predicting new data).

 Hinton diagrams (grayscale heat maps):


o Show how much each weight contributes.

o With weight decay, the weights are more balanced across all hidden units.

11.5.4 Number of Hidden Units and Layers:


The number of hidden units and layers in a neural network affects its ability to learn patterns
from data. Too few hidden units may cause underfitting, where the network is too simple to
capture complex relationships. Too many hidden units can lead to overfitting, but this can
be controlled using regularization techniques like weight decay, which shrinks unnecessary
weights toward zero. Typically, networks use between 5 and 100 hidden units, with more
units for larger datasets. The number of hidden layers controls how many levels of features
the network can learn. Each layer extracts features, and deeper networks can capture more
complex, hierarchical patterns. In practice, starting with enough hidden units and using
regularization often works better than trying to fine-tune the exact number of units with
cross-validation.
11.5.5 Multiple Minima: Neural networks often have many local minima because their error
function is nonconvex, meaning there are many possible low-error solutions. The final result
depends heavily on the initial weights chosen at the start of training. To handle this, the 3
Solutions are,
1️.Try Multiple Starting Points
 Train the network several times, each time starting with a different random
initialization of weights.
 Pick the model that has the lowest (penalized) error.
2️. Averaging Predictions (Ensembling)

 A better approach is to average the predictions from all the networks you trained.
 This works better than averaging the actual weights, because neural networks are
nonlinear — averaging weights could confuse the whole model.
3️. Bagging (Bootstrap Aggregating)

 Another option is bagging, where:


o Train several networks on slightly different (randomly perturbed) versions
of the training data and average the predictions of all those networks.
 Bagging helps improve stability and reduces overfitting.

Support Vector Machines:


In many real-world cases, classes overlap, making perfect separation impossible. Support
Vector Machines (SVMs) address this by transforming data into a higher-dimensional space,
where a linear boundary can separate classes more effectively, allowing for nonlinear
decision boundaries.
Another set of techniques builds on Fisher’s Linear Discriminant Analysis (LDA), which is
used to find the best way to separate different groups of data. Some important generalizations
of LDA include:
 Flexible Discriminant Analysis (FDA): Similar to SVMs, this method allows for
nonlinear decision boundaries, making it more adaptable to complex data.
 Penalized Discriminant Analysis (PDA): Useful in cases like image or signal
classification, where there are many features that are highly correlated, helping to
avoid overfitting.
 Mixture Discriminant Analysis (MDA): Helps classify irregularly shaped data
distributions, which don’t fit neatly into simple geometric boundaries.
12.2 The Support Vector Classifier:
https://chatgpt.com/c/67ceea55-2ec0-800e-adc6-20f3bfef1867

You might also like