Lecture 12 - Machine Learning
Lecture 12 - Machine Learning
of CSE
East West University, Bangladesh
• Supervised Learning: The algorithm learns from labeled data, meaning the input data is
associated with known outcomes or labels.
• Unsupervised Learning: The algorithm learns from unlabeled data and tries to uncover
hidden patterns without any explicit output labels.
• Reinforcement Learning: The algorithm learns through trial and error by interacting with an
environment and receiving feedback through rewards or penalties.
1. Problem Definition: Understanding the problem at hand and deciding whether machine
Learning is the right approach.
2. Data Collection: Gathering relevant data that represents the problem space.
3. Data Preprocessing: Cleaning, transforming, and normalizing data to ensure it is suitable
for the machine learning model.
4. Feature Engineering: Selecting or extracting relevant features (variables) that will be used
by the model.
5. Model Selection: Choosing the appropriate machine learning algorithm(s) for the task.
6. Model Training: Feeding the model with training data and allowing it to learn patterns.
7. Model Evaluation: Assessing the model's performance using test data and appropriate
metrics.
8. Model Tuning: Optimizing model parameters (hyperparameters) to improve performance.
9. Deployment: Implementing the model in a real-world environment to make predictions.
10. Monitoring and Maintenance: Continuously monitoring model performance and updating
it as needed based on new data.
1|Pa ge
Md Rifat Ahmmad Rashid, Associate Professor, Dept. of CSE
East West University, Bangladesh
2|Pa ge
Md Rifat Ahmmad Rashid, Associate Professor, Dept. of CSE
East West University, Bangladesh
Fine-Tuning a Model
1. Adjust Hyperparameters:
• Learning Rate: Decrease the learning rate if the model's loss is fluctuating wildly.
Increase it slightly if the model is learning too slowly.
• Batch Size: A larger batch size can stabilize gradient updates but requires more memory.
A smaller batch size can lead to noisier updates but often helps in generalization.
2. Regularization:
• L2 Regularization (Ridge): Helps to reduce overfitting by adding a penalty to large
weights.
• Dropout (for deep learning): Randomly "drops" neurons during training to prevent the
network from becoming too reliant on any one node, thus reducing overfitting.
3. Early Stopping:
• Stop training when the validation loss stops improving. This can prevent overfitting, as
the model will stop when it starts performing poorly on the validation set, even if the
training loss is decreasing.
4. Optimizer Adjustment:
• For deep learning models, using optimizers like Adam or RMSprop often improves
performance over traditional SGD (Stochastic Gradient Descent). Try different
optimizers or adjust the learning rate schedule (e.g., step decay, exponential decay).
Bias-Variance Tradeoff
The bias-variance tradeoff is a fundamental concept in machine learning that describes the
balance between two sources of errors:
Bias: Refers to the error introduced by approximating the real-world problem too simply. High
bias occurs when the model is too simple and underfits the data.
Variance: Refers to the error introduced by the model’s sensitivity to small fluctuations in the
training data. High variance occurs when the model is too complex and overfits the data.
Example: A deep neural network with many layers memorizing training data.
3|Pa ge
Md Rifat Ahmmad Rashid, Associate Professor, Dept. of CSE
East West University, Bangladesh
Goal: The goal is to find the "sweet spot" where the model complexity is just right, minimizing
both bias and variance.
Epoch: One epoch means that the model has seen the entire training data once.
Iteration: One iteration means that the model has updated its weights once after processing a
batch of data.
Monitoring Training:
• Track training and validation loss across epochs.
• Stop training when the validation loss starts to increase, indicating overfitting.
4|Pa ge
Md Rifat Ahmmad Rashid, Associate Professor, Dept. of CSE
East West University, Bangladesh
Data splitting involves partitioning the dataset into different subsets to train, validate, and test the
model. The goal is to evaluate the model's performance reliably while maximizing the use of
available data.
1. Training Set:
o The largest portion (typically 60-80%) of the data is used for training the model.
o The model learns patterns, adjusts weights, and minimizes error on this subset.
2. Validation Set:
o 10-20% of the data is reserved for validation.
o It is used to fine-tune the model’s hyperparameters, check for overfitting, and
evaluate the model's performance during training.
3. Test Set:
o The remaining 10-20% of the data is set aside as a test set, which is used to assess
the final performance of the model.
o This set is kept entirely separate from the training and validation processes to
provide an unbiased evaluation of the model's accuracy.
• Image Data:
o When dealing with image datasets, it's common to split data into training (70-
80%), validation (10-15%), and test (10-15%) sets. For smaller image datasets,
additional techniques like data augmentation are often applied to increase the
diversity of the training set.
o For object detection and segmentation tasks, maintaining a balanced distribution
of classes and object scales across the splits is crucial.
• NLP Data:
o NLP datasets are often split in the same ratios as typical machine learning
problems (e.g., 70% train, 15% validation, 15% test). However, special attention
is needed to maintain a balanced representation of different classes, languages, or
document types (e.g., sentiment classes or entity types) across all splits.
o Additionally, for sequence data, ensuring that similar sequences are not
overrepresented in the training set compared to validation and test sets is
essential.
5|Pa ge
Md Rifat Ahmmad Rashid, Associate Professor, Dept. of CSE
East West University, Bangladesh
Cross-Validation
K-Fold Cross-Validation:
1. Concept: The dataset is divided into K subsets (or folds). The model is trained on K-1
folds and validated on the remaining fold. This process is repeated K times, with each
fold serving as the validation set once. The performance metric is averaged across all K
iterations.
o When to Use:
▪ Ideal for small to medium-sized datasets where it's crucial to maximize the
use of all available data.
▪ Suitable for most types of machine learning problems (e.g., tabular data,
image data, and NLP).
6|Pa ge
Md Rifat Ahmmad Rashid, Associate Professor, Dept. of CSE
East West University, Bangladesh
o Benefit: Reduces bias and variance by ensuring that every data point is used for
both training and validation. It provides a more reliable measure of model
performance, particularly for small datasets where every observation is valuable.
• Large Datasets: When working with large datasets (e.g., millions of observations), a
simple train-validation-test split is often sufficient, as the dataset is already large enough
to provide reliable estimates.
• Computationally Expensive Models: For deep learning models or other complex
algorithms that require significant training time, cross-validation may not be practical due
7|Pa ge
Md Rifat Ahmmad Rashid, Associate Professor, Dept. of CSE
East West University, Bangladesh
to the need to train multiple models. In such cases, using a separate validation set or
employing alternative validation strategies like early stopping is more efficient.
Regression tasks involve predicting continuous numerical values. Evaluation metrics for regression
measure the difference between the predicted and actual values.
o Definition: The average of the absolute differences between predicted and actual
values.
o Formula:
o Definition: The average of the squared differences between predicted and actual
values.
o Formula:
o Interpretation: Penalizes larger errors more heavily than MAE. Lower values are
better.
o Definition: The square root of the MSE. It represents the standard deviation of the
residuals (prediction errors).
8|Pa ge
Md Rifat Ahmmad Rashid, Associate Professor, Dept. of CSE
East West University, Bangladesh
o Formula:
o Definition: Measures the proportion of the variance in the dependent variable that
is predictable from the independent variables.
o Interpretation: Values range from 0 to 1, with higher values indicating better model
performance.
5. Adjusted R-Squared:
o Definition: The average of the absolute percentage errors between predicted and
actual values.
o Formula:
9|Pa ge
Md Rifat Ahmmad Rashid, Associate Professor, Dept. of CSE
East West University, Bangladesh
o Range: [-∞, 1], where 1 indicates perfect prediction and lower values indicate worse
performance.
Classification tasks involve predicting categorical outcomes. Metrics for classification are designed
to evaluate the accuracy and robustness of such predictions.
1. Accuracy:
o Formula:
o Interpretation: Suitable when the dataset is balanced, but less informative for
imbalanced datasets.
2. Precision:
o Definition: The ratio of true positive predictions to the total positive predictions.
o Formula:
o Interpretation: Indicates how many positive predictions are actually correct; useful
in cases where false positives are costly.
3. Recall (Sensitivity):
o Definition: The ratio of true positive predictions to the total actual positives.
o Formula:
o Interpretation: Indicates how many actual positives the model captures; crucial in
scenarios where false negatives are costly.
4. F1 Score:
o Formula:
10 | P a g e
Md Rifat Ahmmad Rashid, Associate Professor, Dept. of CSE
East West University, Bangladesh
o Range: [0, 1], where 1 indicates perfect separation and 0.5 indicates no separation.
11 | P a g e
Md Rifat Ahmmad Rashid, Associate Professor, Dept. of CSE
East West University, Bangladesh
o Definition: A plot that shows the trade-off between precision and recall for different
thresholds.
o Formula:
12 | P a g e
Md Rifat Ahmmad Rashid, Associate Professor, Dept. of CSE
East West University, Bangladesh
• Input (Features): Variables or attributes of the data that are used as input for the model
(e.g., age, height, income).
• Output (Label/Target): The variable that the model aims to predict (e.g., house price,
classification category).
13 | P a g e
Md Rifat Ahmmad Rashid, Associate Professor, Dept. of CSE
East West University, Bangladesh
1. Regression:
o Examples:
▪ Predicting house prices based on features like square footage, location, and
number of bedrooms.
o Common Algorithms:
▪ Linear Regression
14 | P a g e
Md Rifat Ahmmad Rashid, Associate Professor, Dept. of CSE
East West University, Bangladesh
Image source: https://www.geeksforgeeks.org/ml-linear-regression/
2. Classification:
o Examples:
o Common Algorithms:
▪ Logistic Regression
▪ Decision Trees
▪ Naive Bayes
1. Data Collection:
o Gather a dataset that includes both inputs (features) and corresponding outputs
(labels). The dataset must be labeled accurately for the model to learn effectively.
2. Data Preprocessing:
o Encoding: Convert categorical features into numerical format using techniques like
one-hot encoding or label encoding.
o Splitting Data: Divide the dataset into training, validation, and test sets to evaluate
model performance accurately.
3. Model Selection:
15 | P a g e
Md Rifat Ahmmad Rashid, Associate Professor, Dept. of CSE
East West University, Bangladesh
4. Model Training:
o The model is trained on the training set using a chosen algorithm. The model
iteratively updates its parameters to minimize the error (loss function) between the
predicted and actual outputs.
5. Model Evaluation:
o Evaluate the model’s performance using the validation set. Performance metrics
vary based on the type of task:
▪ For regression: Mean Absolute Error (MAE), Mean Squared Error (MSE), R-
squared.
6. Hyperparameter Tuning:
o Adjust the model’s hyperparameters (e.g., learning rate, depth of a decision tree) to
optimize performance. This is often done using techniques like grid search or
random search with cross-validation.
7. Model Testing:
o The final model is tested on the test set, which is unseen during training and
validation, to assess its generalization ability.
o Training Set: Used to train the model. It represents the majority of the data.
o Test Set: Used to evaluate the final model’s performance. It provides an unbiased
estimate of how the model will perform on new data.
2. Loss Function:
o A function that measures the error between the predicted output and the actual
output. The objective of training is to minimize this error.
▪ For regression: Mean Squared Error (MSE), Mean Absolute Error (MAE).
3. Optimization Algorithms:
16 | P a g e
Md Rifat Ahmmad Rashid, Associate Professor, Dept. of CSE
East West University, Bangladesh
o Methods used to minimize the loss function and update the model’s parameters.
These algorithms determine how the model learns from the data.
o Examples:
▪ Gradient Descent
4. Bias-Variance Tradeoff:
o Variance: The model’s sensitivity to small fluctuations in the training data. High
variance leads to overfitting.
o The goal is to find a balance between bias and variance to achieve optimal model
performance.
5. Cross-Validation:
2. Logistic Regression:
3. Decision Trees:
17 | P a g e
Md Rifat Ahmmad Rashid, Associate Professor, Dept. of CSE
East West University, Bangladesh
o Use Case: Simple and intuitive classification and regression tasks based on
similarity measures.
6. Naive Bayes:
7. Ensemble Methods:
o Random Forest: Combines multiple decision trees for improved accuracy and
stability.
o Gradient Boosting (e.g., XGBoost): Sequentially trains models to correct the errors
of previous ones.
o Use Case: Effective in both regression and classification, particularly for complex
datasets.
1. Accuracy: Supervised learning models can achieve high accuracy if there is sufficient
labeled data and the model is well-tuned.
2. Interpretability: Some models (e.g., linear regression, decision trees) are easy to interpret
and understand.
Disadvantages:
2. Overfitting: Models may perform well on the training data but poorly on new data if they
overfit. Regularization techniques and cross-validation are used to mitigate this risk.
3. Scalability: For large datasets or complex tasks (e.g., deep learning models), training can
be computationally intensive and time-consuming.
18 | P a g e