5.MLP in Practice
5.MLP in Practice
1. Data Preprocessing
• Before training an MLP, data needs to be preprocessed
Normalization/Standardization: The features are usually scaled to have
a mean of 0 and a standard deviation of 1. This helps the network learn
more efficiently.
continue.....
• In machine learning, especially for models like neural networks,
normalization and standardization are crucial for efficient learning.
These techniques ensure that features are on a similar scale,
preventing some variables from dominating others due to differences
in magnitude.
continue....
• Train-Test Split: You divide the data into a training set and a test set.
The training set is used to train the model, and the test set is used to
evaluate the model's performance.
• Categorical Encoding (for classification tasks): For categorical
features, you might need to apply techniques like one-hot encoding.
• When working with categorical features in machine learning, we must
convert them into a numerical format because most machine learning
models work with numbers, not text. Categorical encoding is the
process of transforming categorical variables into numerical
representations.
(A) One-Hot Encoding (OHE)
•Converts categories into binary columns (0s and 1s).
•Each unique category becomes a separate column.
2. Model Architecture
• MLPs typically have:
• Input Layer: Takes the input features of your data.
• Hidden Layers: One or more layers of neurons. These layers perform
transformations of the data.
• Output Layer: Provides the output predictions (for classification or
regression).
• Each neuron in a layer takes a weighted sum of the inputs, passes it
through an activation function, and then forwards it to the next layer.
• Common activation functions:
I. Sigmoid: f(x) = 1 / (1 + exp(-x))
3. Loss Function
• The loss function measures how well the network's predictions match
the actual outputs. Common choices are:
• Mean Squared Error (MSE): For regression tasks.
• Cross-Entropy Loss: For classification tasks.
4. Training the Network
• Training an MLP involves the following:
1) Forward Propagation: The input data is passed through the network,
layer by layer, to produce an output.
2) Backpropagation: The error (difference between predicted and actual
values) is propagated backward through the network to adjust the
weights using gradient descent or other optimization algorithms.
3) Gradient Descent: The model's weights are updated using an
optimization algorithm like stochastic gradient descent (SGD) or
RMSprop (RMSprop (Root Mean Square Propagation).
What is SGD?
• SGD is an optimization algorithm that updates the weights using the
gradient of the loss function computed from a single random sample
(or a small batch) at each step.
where:
•η is eta (learning rate) controls the step size.
•dL/dw is the gradient of the loss with respect to the
weight
5. Evaluation
• After training, the model is evaluated on the test set using metrics like:
1) Accuracy (for classification)
2) Precision, Recall, F1-score (for classification)
3) Mean Absolute Error (MAE) or Mean Squared Error (MSE) (for
regression)
• Accuracy is a common metric for classification tasks, defined as the
ratio of correctly predicted instances to the total instances in the
dataset:
• How to tune?
• Determines how many samples are used to compute gradients before updating
weights.
• Small batch size (e.g., 32, 64) → More noise, but better generalization.
• Large batch size (e.g., 256, 512) → Faster training, but may lead to poor
generalization.
• How to tune?