0% found this document useful (0 votes)
16 views19 pages

5.MLP in Practice

The document provides an overview of Multilayer Perceptrons (MLPs), outlining their structure, including input, hidden, and output layers, and their application in supervised learning tasks. It details the steps for implementing MLPs, such as data preprocessing, model architecture, loss functions, training processes, evaluation metrics, and hyperparameter tuning. Key concepts like normalization, one-hot encoding, forward and backpropagation, and optimization algorithms like SGD and Adam are also discussed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views19 pages

5.MLP in Practice

The document provides an overview of Multilayer Perceptrons (MLPs), outlining their structure, including input, hidden, and output layers, and their application in supervised learning tasks. It details the steps for implementing MLPs, such as data preprocessing, model architecture, loss functions, training processes, evaluation metrics, and hyperparameter tuning. Key concepts like normalization, one-hot encoding, forward and backpropagation, and optimization algorithms like SGD and Adam are also discussed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

MLP_In_Practice

A Multilayer Perceptron (MLP) is a type of neural network that consists


of multiple layers of neurons, including an input layer, one or more
hidden layers, and an output layer. It's used for supervised learning
tasks, particularly classification and regression. MLPs are fully
connected, meaning each neuron in one layer is connected to every
neuron in the next layer.
When implementing MLP in practice, the
process generally involves these steps:

1. Data Preprocessing
• Before training an MLP, data needs to be preprocessed
Normalization/Standardization: The features are usually scaled to have
a mean of 0 and a standard deviation of 1. This helps the network learn
more efficiently.
continue.....
• In machine learning, especially for models like neural networks,
normalization and standardization are crucial for efficient learning.
These techniques ensure that features are on a similar scale,
preventing some variables from dominating others due to differences
in magnitude.
continue....
• Train-Test Split: You divide the data into a training set and a test set.
The training set is used to train the model, and the test set is used to
evaluate the model's performance.
• Categorical Encoding (for classification tasks): For categorical
features, you might need to apply techniques like one-hot encoding.
• When working with categorical features in machine learning, we must
convert them into a numerical format because most machine learning
models work with numbers, not text. Categorical encoding is the
process of transforming categorical variables into numerical
representations.
(A) One-Hot Encoding (OHE)
•Converts categories into binary columns (0s and 1s).
•Each unique category becomes a separate column.
2. Model Architecture
• MLPs typically have:
• Input Layer: Takes the input features of your data.
• Hidden Layers: One or more layers of neurons. These layers perform
transformations of the data.
• Output Layer: Provides the output predictions (for classification or
regression).
• Each neuron in a layer takes a weighted sum of the inputs, passes it
through an activation function, and then forwards it to the next layer.
• Common activation functions:
I. Sigmoid: f(x) = 1 / (1 + exp(-x))
3. Loss Function
• The loss function measures how well the network's predictions match
the actual outputs. Common choices are:
• Mean Squared Error (MSE): For regression tasks.
• Cross-Entropy Loss: For classification tasks.
4. Training the Network
• Training an MLP involves the following:
1) Forward Propagation: The input data is passed through the network,
layer by layer, to produce an output.
2) Backpropagation: The error (difference between predicted and actual
values) is propagated backward through the network to adjust the
weights using gradient descent or other optimization algorithms.
3) Gradient Descent: The model's weights are updated using an
optimization algorithm like stochastic gradient descent (SGD) or
RMSprop (RMSprop (Root Mean Square Propagation).
What is SGD?
• SGD is an optimization algorithm that updates the weights using the
gradient of the loss function computed from a single random sample
(or a small batch) at each step.

where:
•η is eta (learning rate) controls the step size.
•dL/dw is the gradient of the loss with respect to the
weight
5. Evaluation
• After training, the model is evaluated on the test set using metrics like:
1) Accuracy (for classification)
2) Precision, Recall, F1-score (for classification)
3) Mean Absolute Error (MAE) or Mean Squared Error (MSE) (for
regression)
• Accuracy is a common metric for classification tasks, defined as the
ratio of correctly predicted instances to the total instances in the
dataset:

Mean Squared Error (MSE): Measures the average squared difference


between actual and predicted values.
6. Hyperparameter Tuning
• You can improve the MLP's performance by tuning hyperparameters
such as:
1) Number of hidden layers and neurons
2) Learning rate
3) Batch size
4) Optimizer choice (SGD, etc.)
Number of Hidden Layers and
Neurons
• What it does?
• Hidden layers and neurons control the model's complexity.
• More layers → Higher ability to capture complex patterns (but risk of
overfitting).
• More neurons per layer → More capacity to learn, but increases
computational cost.

• How to tune?

• Start with one hidden layer and gradually increase.


Learning Rate (α)
• What it does?
• Controls how much the model adjusts weights during training.
• A high learning rate → Faster learning,
• A low learning rate → More stable learning, but may take longer to
converge.
• How to tune?

• Common values: 0.01, 0.001, 0.0001.


Batch Size
• What it does?

• Determines how many samples are used to compute gradients before updating
weights.
• Small batch size (e.g., 32, 64) → More noise, but better generalization.
• Large batch size (e.g., 256, 512) → Faster training, but may lead to poor
generalization.
• How to tune?

• Start with 32 or 64, and experiment with larger sizes.


• If training is unstable, reduce batch size.
Optimizer Choice (SGD, Adam,
RMSprop, etc.)
• What it does?
• Optimizers adjust weights based on gradient updates.
• SGD (Stochastic Gradient Descent) → Works well but can be slow.
• Adam (Adaptive Moment Estimation) → Combines benefits of SGD and
momentum, works well for most cases.
• RMSprop → Good for recurrent networks, normalizes learning rate.
• How to tune?
• Start with Adam (default: learning rate = 0.001).
• If training is unstable, try SGD with momentum (0.9).
• If gradients explode or vanish, try RMSprop.

You might also like