Unit 2
Unit 2
1
Introduction to Modelling and Evaluation
• Modeling and evaluation are two fundamental aspects of the machine learning
process, working hand in hand to create effective predictive models and assess
their performance.
• Modeling:
3. Training: Fitting the selected model to the training data by adjusting its
parameters to minimize the error or loss function. This is done through
optimization techniques such as gradient descent.
• Common evaluation metrics depend on the nature of the problem and may
include accuracy, precision, recall, F1-score, mean squared error, or area
under the receiver operating characteristic (ROC) curve.
1. Data Splitting: Dividing the available data into separate training, validation,
and testing sets.
Introduction to Modelling and Evaluation
2. Performance Measurement: Applying the trained model to the testing dataset
and computing relevant evaluation metrics to quantify its performance.
• This step helps assess the model's accuracy, robustness, and generalization
ability.
3. Comparison: Comparing the performance of different models or variations of
the same model using the same evaluation metrics.
This allows practitioners to select the best-performing model for the given
task and dataset.
• Advantages
1. Predictive Analytics: Models are used for predicting customer behavior, stock
prices, disease outbreaks, and more.
2. Recommendation Systems: ML Models are powerful in personalized
recommendations for products, movies, music, and content.
3. Image and Speech Recognition: Models classify images, transcribe speech, and
enable facial recognition in various applications.
4. Natural Language Processing: Models analyze and generate human language,
facilitating translation, sentiment analysis, chat bots, and more.
Introduction to Modelling and Evaluation
5. Healthcare: Models assist in diagnosis, prediction, treatment
recommendation, drug discovery, and personalized medicine.
6. Finance: Models predict market trends, assess credit risk, detect fraud, and
optimize investment strategies.
7. Autonomous Vehicles: Models enable real- time decision-making for
autonomous vehicles, including obstacle detection, path planning, and
collision avoidance.
• Here's a detailed guide on how to select a model in machine learning along with
precautions and reasons for each step:
• Clearly define the problem: A well-defined problem helps you choose models
suited for that specific task.
• Research models commonly used for your chosen task type (e.g., decision
trees, logistic regression for classification).
• Consider factors like interpretability (how easy it is to understand the
model's predictions) and computational cost (training time and resources).
• Example: For customer churn prediction, you might consider decision trees
for their interpretability or random forests for better accuracy.
• Reasons:
• Analyze your data for characteristics relevant to model selection (e.g., data
size, feature types, presence of missing values).
• Preprocess your data to ensure it's compatible with the chosen model (e.g.,
scaling numerical features, handling categorical features).
• Example: If you have a lot of missing values, a model like k-Nearest Neighbors
might not be ideal as it relies heavily on complete data points.
Selecting a Model in Machine Learning
• Reasons:
• Based on evaluation results, choose the model with the best performance on
unseen data.
• Some of the popular classification models include k-Nearest Neighbor (kNN), Naïve
Bayes, and Decision Tree.
• Predictive models may also be used to predict numerical values of the target feature
based on the predictor features.
• 3. Prediction of potential flu patients and demand for flu shots next winter
Selecting a Model in Machine Learning
• The models which are used for prediction of the numerical value of the target
feature of a data instance are known as regression models.
• 2. Descriptive models
• Reason: Splitting the data into training and testing sets allows for
evaluating the model's performance on unseen data and helps prevent
overfitting.
• Steps:
1. Split the dataset into training and testing sets, typically using a ratio
such as 70-30 or 80-20.
• Steps:
1. Feed the training data into the model and obtain predictions.
2. Calculate the loss between the predicted and actual values using a
suitable loss function.
• Reason: Evaluating the trained model on the testing set provides insights into
• Steps:
• Steps:
1. Experiment with different values for hyper parameters such as learning rate,
1. Save the trained model parameters and architecture to disk for future use.
• 1. Holdout method:
• In case of supervised learning, a model is trained using the labelled input data.
However, how can we understand the performance of the model ?
• Hence, a part of the input data is held back for evaluation of the model.
• The subset of the input data is used as the test data for evaluating the
performance of a trained model.
Training a Model in Machine Learning
• The method of partitioning the input data into two parts – training and test data
which is by holding back a part of the input data for validating the trained model
is known as holdout method.
• The validation data is used in place of test data, for measuring the
model performance. It is used in iterations and to refine the model
in each iteration.
• The test data is used only for once, after the model is refined and
finalized.
Training a Model in Machine Learning
2. K-fold Cross-validation method
• Figure below depicts the detailed approach of selecting the ‘k’ folds in k-fold
cross-validation.
Training a Model in Machine Learning
• 2.b Leave-one-out cross-validation (LOOCV) : is an extreme case of
k-fold cross-validation using one record or data instance at a time
as a test data.
• This is done to maximize the count of data used to train the model.
• Bootstrapping randomly picks data instances from the input data set, with the
possibility of the same data instance to be picked multiple times.
• Input data set having ‘n’ data instances, bootstrapping can create one or more
training data sets having ‘n’ data instances.
Training a Model in Machine Learning
• 4. Lazy vs. Eager learner
• An eager learner, generalizes the training data during the training phase
itself. It builds a model as soon as it receives the training data, and this model
is used to make predictions.
• When the test data comes in for classification, the eager learner is ready with
the model and doesn’t need to refer back to the training data.
Training a Model in Machine Learning
• Eager learners take more time in the learning phase than the lazy learners.
Some of the algorithms which adopt eager learning approach include Decision
Tree, Support Vector Machine, Neural Network, etc.
• A lazy learner is a learning model that defers the process of generalization until
a query is made.
• In other words, it stores the training data and waits until it receives a query (i.e.,
an input) before performing any computation.
• This means that lazy learners don't build models based on training data until
they are needed for prediction.
Training a Model in Machine Learning
• Lazy learners take very little time in training because not much of
training actually happens.
• One of the most popular algorithm for lazy learning is k-nearest neighbor.
Model Representation and Interpretability Machine Learning
• Model representation and interpretability in machine learning refer to how a
trained model captures and represents patterns and relationships in the data
and how understandable and explainable the model's decisions are to humans.
The step-by-step guide with suitable real-time examples are :
• Some issues that are faced during the model representation are :
• 2. Overfitting
• Any specific deviation in the training data, like noise or outliers, gets
embedded in the model.
Model Representation and Interpretability Machine Learning
• It adversely impacts the performance of the model on the test data.
• Overfitting results in good performance with training data set, but poor
performance with test data set.
• 3. remove the nodes which have little or no predictive power for the given
machine learning problem.
Model Representation and Interpretability Machine Learning
• The representation of underfitting and overfitting with a sample
data set is shown in figure below:
• The target function, in these
cases, tries to make sure all
training data points are
correctly partitioned by the
decision boundary.
Model Representation and Interpretability Machine Learning
• The figure shows examples in both
regression (top row) and
classification (bottom row).
• Underfitting :
• In machine learning, the goal is to achieve a balanced fit, which provides good
performance on both training and unseen test data.
Model Representation and Interpretability Machine Learning
• 3. Bias – variance trade-off
• This error in learning can be of two types – errors due to ‘bias’ and error
due to ‘variance’.
• A model with high variance tends to overfit the training data, capturing
noise and random fluctuations.
• (a) the model is too simple and hence fails to interpret the data grossly or.
• (b) the model is extremely complex and magnifies even small differences
in the training data.
• Increasing the bias will decrease the variance, and Increasing the
variance will decrease the bias.
• The best solution is to have a model with low bias as well as low variance
which may not be possible as shown in Figure below :
Model Representation and Interpretability Machine Learning
Evaluating Performance of a Model in Machine Learning
• Evaluating the performance of a machine learning model is essential to assess
its effectiveness in making predictions or classifications. Here's a detailed step-
by-step procedure with real-time examples:
• Step 1: Define Evaluation Metrics
1. Choose metrics relevant to your specific problem type. Don't rely solely on
overall accuracy!
4. Precision: Ratio of true positives to all predicted positives (how good is the
model at identifying true positives?).
Evaluating Performance of a Model in Machine Learning
5. Recall: Ratio of true positives to all actual positives (how good is the model at
finding all the positives?).
Example : Regression Tasks (e.g., stock price prediction, house price prediction)
6. Mean Squared Error (MSE): Average squared difference between predicted and
actual values (lower MSE indicates better performance).
7. R-Squared: Proportion of variance in the target variable explained by the
model (higher R- squared indicates better fit).
• Example: Spam filter evaluation:
1. Accuracy: Tells you the overall percentage of emails correctly classified as spam or
not-spam.
Evaluating Performance of a Model in Machine Learning
2. Precision: Measures how good the filter is at identifying actual spam emails
(avoiding false positives).
3. Recall: Measures how good the filter is at catching all spam emails (avoiding
false negatives).
Example
• = 95 / 100 = 95%
1. Training Set (Majority): Used to train the model (typically 60-80% of the
data).
3. Test Set (Minority): Used for final evaluation of the trained model's
performance on unseen data (typically 10-20% of the data).
Evaluating Performance of a Model in Machine Learning
• Reasoning: The training set teaches the model, the validation set helps fine-tune
it, and the test set provides an unbiased assessment of its generalizability.
• Use the training data to train your chosen machine learning algorithm.
• Step 4: Evaluate the Model on the Test Set
2. Calculate the chosen evaluation metrics based on the model's predictions and
the actual target values in the test set.
Evaluating Performance of a Model in Machine Learning
• Step 5: Analyze the Results
1. Over fitting: The model performs well on training data but poorly on the
test set.
2. Under fitting: The model is too simple and doesn't capture the underlying
patterns in the data.
Evaluating Performance of a Model in Machine Learning
• Step 6: (Optional) Retrain and Re-evaluate
• Procedure: Define the performance metrics that are most relevant to the
problem at hand (e.g., accuracy, precision, recall, F1-score, ROC-AUC for
classification; mean squared error, R- squared for regression).
• Explanation: Identifying appropriate performance metrics provides a clear
objective for model improvement.
Improving performance of a Model in Machine Learning
• Step 2: Analyze Model Errors:
• Procedure: Analyze the errors made by the model on the validation/testing set.
• Procedure: Consider alternative algorithms or models that may better suit the
problem domain or data characteristics.
• Explanation: Different algorithms have different strengths and weaknesses,
and switching to a more suitable algorithm can lead to improved performance.
Improving performance of a Model in Machine Learning
• Step 6: Ensemble Methods:
• Step 7: Cross-Validation:
• Procedure:
1. Data Preparation: Clean and pre-process your data before feature selection.
• Example: Using a filter method like chi-squared test to identify features that
have a strong correlation with the target variable in a classification task.
• 3.2 Wrapper Methods: These methods evaluate feature subsets based on their
impact on the performance of a machine learning model. They train the model
with different feature subsets and choose the one that yields the best
performance.
Feature subset selection in Machine Learning
• Example: Using a wrapper method like recursive feature elimination (RFE) to
iteratively remove the least informative feature and retrain the model until a
desired performance level is reached.
• 3.3 Embedded Methods: These methods integrate feature selection as part of the
model training process.
• Some algorithms, like decision trees, inherently perform feature selection during
training.
4. Evaluation and Refinement: Evaluate the performance of the model using the
selected features with your chosen evaluation metric.
Feature subset selection in Machine Learning
• Advantages: Disadvantages:
1. Feature selection methods might not always find the optimal subset.
• Example: In medical diagnosis, feature subset selection can help identify the
most relevant patient characteristics.
3. Finance:
• Procedure:
1. Data Standardization: Prepare your data by standardizing it. This ensures all
features contribute equally to the analysis.
2. Covariance Matrix Calculation: Calculate the covariance matrix, which
captures the linear relationships between all features in your data.
3. Eigenvalue Decomposition: Decompose the covariance matrix to find its
eigenvectors (principal components) and eigenvalues.
Principal Component Analysis (PCA) in Machine Learning:
• Eigenvectors represent the directions of greatest variance in the data, and
eigenvalues represent the amount of variance explained by each
component.
4. Dimensionality Reduction: Select the top 'n' principal components that
explain the majority of the variance in the data (e.g., 90%). This 'n'
represents the new, lower-dimensional space.
5. Data Transformation: Project the original data onto the selected
principal components, effectively creating a lower-dimensional
representation of the data.
Principal Component Analysis (PCA) in Machine Learning:
• Advantages of PCA:
• Disadvantages of PCA:
1. Information Loss: PCA discards information by eliminating less
significant principal components.
• where:
• SVD gives you the whole nine-yard of diagonalizing a matrix into special
• Example :
• A=UΣVT
Single Value Decomposition(SVD) in Machine Learning
Single Value Decomposition(SVD) in Machine Learning
Single Value Decomposition(SVD) in Machine Learning
Single Value Decomposition(SVD) in Machine Learning
Single Value Decomposition(SVD) in Machine Learning
Single Value Decomposition(SVD) in Machine Learning
Single Value Decomposition(SVD) in Machine Learning
import numpy as np
import pandas as pd
from sklearn.decomposition import TruncatedSVD
from sklearn.datasets import load_iris
# Perform SVD
svd = TruncatedSVD(n_components=2)
X_reduced = svd.fit_transform(X)
1. Data Preparation:
• Assess the model's fit using goodness-of-fit measures and evaluate the
interpretability of the extracted factors. You might need to refine the number
of factors or extraction method based on the results.
1. Data Preparation: Ensure your data is clean and pre-processed, with labels
indicating the class for each data point.
2. Maximizing Class Separation: LDA finds a linear transformation that
maximizes the separation between the means of different classes in the data.
This helps to create a clear distinction between the classes.
3. Minimizing Within-Class Variance: Simultaneously, LDA aims to minimize the
variance within each class.
This ensures that data points within a class are tightly clustered, further
enhancing the separation between classes.
Linear Discriminant Analysis (LDA)
4. Dimensionality Reduction: LDA can project the data onto a lower-
dimensional space while maintaining the class separability.
This can be beneficial for reducing computational cost and
improving model performance in some cases.
5. Classification: New data points are projected onto the transformed
space and classified based on the learned decision boundary
between classes.
Linear Discriminant Analysis (LDA)
• Advantages of LDA:
1. Effective Classification: LDA performs well for tasks where classes are linearly
separable.
• Disadvantages of LDA:
• Limitations of LDA:
1. Facial Recognition: Classifying faces based on features like eyes, nose, and
mouth.
2. Spam Filtering: Identifying spam emails based on text content and other
features.