Unit 4
Unit 4
Neural networks refer to a broad category of models and techniques used in machine
learning. Originally inspired by the human brain, they are now widely recognized as
mathematical models for regression and classification tasks. The most basic type of neural
network is the "vanilla" neural network, also known as the single hidden layer back-
propagation network or single-layer perceptron. This model consists of one hidden layer and
is trained using backpropagation. neural networks are fundamentally just mathematical
models that transform input data. In essence, neural networks are nonlinear statistical models,
similar to Projection Pursuit Regression (PPR), which also utilizes nonlinear functions to
identify patterns in data.
A neural network is a two-stage regression or classification model, typically represented by a
network diagram as in Figure 11.2.
Neural networks process data in two main stages:
1. Feature extraction – Creating new features from input data.
2. Prediction – Using transformed features to make a final decision.
Figure 11.2 illustrates how data moves from input to output through different layers.
Neural networks can be used for regression (predicting continuous values) and classification
(categorizing data into classes).
In regression, we predict a single numeric output (e.g., predicting house prices).
Since there is only one output value, K=1, meaning a single output neuron (Y 1).
Neural networks can also predict multiple continuous values (For example, a
network could predict both house price and rental income from the same input data).
In classification, we predict one of K classes. The output layer has K neurons, where
each neuron predicts the probability of a class k. For classification, We represent each
class using one-hot encoding that is if there are K classes, the target variable Yk is a
vector with 1 for the correct class and 0 for others.
Mathematical Representation of a Neural Network:
Derived features (Zm) are created using weighted sums (linear combinations) of input
features, these features are then transformed and used to predict the target Yk.
Computing Hidden Layer Neurons:
o Backward pass: Compute how much each weight contributed to the error
(gradients).
o Update weights using these gradients.
Some Issues in Training Neural Networks:
11.5.1 Starting Values:
When training a neural network, we need to initialize the weights (the connections
between neurons).
Key Points
1. Start with small random values near zero.
o This helps the network start off behaving almost like a linear model (simple
straight-line model).
o This makes training smoother and more stable.
o The network won’t learn properly because all weight updates would be the
same (zero derivatives = no movement).
3. Avoid starting with very large weights.
o Large weights make the network highly nonlinear and chaotic right from the
start.
o This often makes training unstable and can lead to poor solutions.
Overfitting:
Overfitting happens when a neural network learns the training data too well,
including noise and unnecessary details.
This causes the network to perform poorly on new (unseen) data.
o Stop training when the validation error starts increasing, even if the
training error keeps decreasing.
What it is: Add a penalty term to the loss function that discourages large weights.
This is similar to ridge regression for linear models.
The new loss function looks like:
Where,
Effect: Shrinks the weights towards zero, making the network simpler and less likely
to overfit.
How to choose λ\: Use cross-validation to find the best value.
🔹 Two Types of Weight Penalties
Type Explanation
Standard
Penalty is the sum of squared weights (like ridge regression).
Weight Decay
Weight Shrinks small weights more than large weights, encouraging simpler
Elimination networks.
o With weight decay, the weights are more balanced across all hidden units.
A better approach is to average the predictions from all the networks you trained.
This works better than averaging the actual weights, because neural networks are
nonlinear — averaging weights could confuse the whole model.
3️. Bagging (Bootstrap Aggregating)