0% found this document useful (0 votes)
5 views37 pages

Unit 3

The document outlines the process of evaluating and selecting machine learning algorithms, emphasizing the importance of defining objectives, data preprocessing, model training, and performance evaluation. It discusses techniques such as hyperparameter tuning, cross-validation, and ensemble methods like bagging and boosting to improve model accuracy. Additionally, it covers applications of statistical learning theory and provides scenarios for practical implementation in areas like recommendation systems and loan approval systems.

Uploaded by

lordismy123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views37 pages

Unit 3

The document outlines the process of evaluating and selecting machine learning algorithms, emphasizing the importance of defining objectives, data preprocessing, model training, and performance evaluation. It discusses techniques such as hyperparameter tuning, cross-validation, and ensemble methods like bagging and boosting to improve model accuracy. Additionally, it covers applications of statistical learning theory and provides scenarios for practical implementation in areas like recommendation systems and loan approval systems.

Uploaded by

lordismy123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Machine Learning

UNIT-3

Dr Rashmi Popli
Associate Professor
Department of Computer Engineering

Dr Rashmi Popli,Associate Professor,Department of CE


Evaluating Machine Learning Algorithms
Evaluating machine learning algorithms and selecting the right model are
crucial steps in building effective and accurate predictive models.
1. Define Objectives and Metrics
• Clearly define the problem you are trying to solve and the goals of your
model.
• Choose appropriate evaluation metrics based on the nature of the problem
(e.g., accuracy, precision, recall, F1-score, mean squared error, R-squared
for regression).
2. Data Preprocessing
• Handle missing data, outliers, and perform data normalization or
standardization as needed.
• Encode categorical variables and handle any other data-specific
preprocessing steps.
Dr Rashmi Popli,Associate Professor,Department of CE
3. Split Data
• Split your dataset into training and testing sets to evaluate model
performance on unseen data.
• Optionally, use techniques like cross-validation for more robust
evaluation.
4. Choose Algorithms
• Select a set of candidate algorithms based on the nature of the problem
(e.g., decision trees, support vector machines, neural networks).
• Consider the characteristics of your data (size, complexity) when choosing
algorithms.
5. Train Models
• Train each selected algorithm on the training data.
• Use hyperparameter tuning to find the optimal settings for each
algorithm.
Dr Rashmi Popli,Associate Professor,Department of CE
6. Evaluate Performance:
• Use the testing set to evaluate the performance of each model.
• Compare performance metrics and choose the model(s) with the best
results.
7. Consider Cross-Validation
• Perform cross-validation to assess how well the models generalize to
different subsets of the data.
• This helps in identifying models that are less prone to overfitting.
8. Model Interpretability:
• Consider the interpretability of the models, especially if the application
requires understanding the reasoning behind predictions.
9. Ensemble Methods:
• Explore ensemble methods (e.g., random forests, gradient boosting) to
combine the strengths of multiple models for improved performance.
Dr Rashmi Popli,Associate Professor,Department of CE
10. Deployment Considerations
• Consider practical aspects of deploying the model, such as computational
resources, real-time requirements, and interpretability.
11. Iterative Process
• Model evaluation and selection are iterative processes. If initial results are
not satisfactory, revisit previous steps, adjust hyperparameters, or try
different algorithms.
12. Documentation
• Document your choices, decisions, and results to facilitate communication
with stakeholders and for future reference.

Dr Rashmi Popli,Associate Professor,Department of CE


Hyper parameter Tuning
• Hyper parameter tuning is the process of selecting the best set of
hyper parameters for a machine learning algorithm.
• Hyper parameters are configuration settings that are external to the
model itself and cannot be directly learned from the training data.
• Examples of hyperparameters include the the depth of a decision
tree, the number of hidden layers in a neural network, and the
regularization parameter in regression.
• Hyperparameter tuning aims to find the optimal combination of
hyperparameters that maximizes the performance of the model on a
validation dataset or through cross-validation.

Dr Rashmi Popli,Associate Professor,Department of CE


Scenario: Tuning a Recommendation System
for an E-commerce Website
• Imagine you are working for an e-commerce company like Amazon, and
your task is to improve the product recommendation system that suggests
items to customers based on their browsing history.
Choosing the Model
• You decide to use a Random Forest Classifier to predict whether a user will
click on a recommended product. However, the performance of the model
depends on its hyperparameters like:
• Number of trees (n_estimators): More trees improve accuracy but slow
down training.
• Depth of trees (max_depth): Too deep → overfitting; too shallow →
underfitting.
• Minimum samples per leaf (min_samples_leaf): Controls the size of leaves.
• You deploy the optimized model, and now customers receive better product
recommendations, increasing click-through rates and sales!
Dr Rashmi Popli,Associate Professor,Department of CE
Cross-validation
• Cross-validation is a technique used in machine learning to assess
how well a model will generalize to an independent dataset.
• It's particularly useful when the dataset is limited, as it allows
maximizing the amount of data used for both training and testing,
thus providing a more reliable estimate of the model's performance.
• Cross-validation helps to provide a more accurate estimate of a
model's performance compared to a single train-test split because it
uses multiple splits of the data.
• This helps to reduce the variability in performance estimates that may
arise from using a single train-test split, especially when the dataset is
small or the data distribution is heterogeneous.

Dr Rashmi Popli,Associate Professor,Department of CE


Scenario: Loan Approval System in a Bank
• A bank wants to develop a machine learning model to predict
whether a customer will default on a loan based on factors like
income, credit score, loan amount, and repayment history.
Problem with a Simple Train-Test Split
• If we simply split the data into 70% training and 30% testing, there’s a
risk that:
• The model might perform well on the test set but fail on new unseen
data.
• The model might be biased if the test set doesn’t represent real-
world customers well.

Dr Rashmi Popli,Associate Professor,Department of CE


Solution: k-Fold Cross-Validation
• Instead of relying on a single train-test split, we use k-fold cross-
validation (e.g., 5-Fold CV):
• Split the dataset into 5 equal parts (folds).
• Train the model on 4 folds and test on the 5th fold.
• Repeat this process 5 times, using a different fold as the test set
each time.
• Compute the average accuracy across all 5 folds.

Dr Rashmi Popli,Associate Professor,Department of CE


Dr Rashmi Popli,Associate Professor,Department of CE
Statistical learning Theory
• Statistics is the mathematical study of data.
• It is a field of study within machine learning that deals with the
theoretical foundations of learning algorithms.
• Statistical learning theory is a framework that draws from the fields
of statistics and functional analysis.
• Using statistics, an interpretable statistical model is created to
describe the data, and this model can then be used to infer something
about the data or even to predict values that are not present in the
sample data used to create the model.
• Objective: The main goal of statistical learning is understanding and
prediction. It aims to build models that can draw meaningful
conclusions from data and make accurate predictions.

Dr Rashmi Popli,Associate Professor,Department of CE


Categories of Learning
• Supervised Learning: In this category, we have labeled data pairs
(input-output pairs), and the goal is to learn a mapping from inputs to
outputs.
• Unsupervised Learning: Here, we work with unlabeled data and aim to
discover underlying patterns or structures.
• Semi-supervised Learning: Combines aspects of both supervised and
unsupervised learning.
• Online Learning: Learning from data streams in real-time.
• Reinforcement Learning: Learning through interaction with an
environment.

Dr Rashmi Popli,Associate Professor,Department of CE


Components of Statistical Learning
• Features (Predictors): The variables used to make predictions.
• Response (Outcome): The variable to be predicted.
• Training Data: The dataset used to train the model.
• Model: The algorithm or mathematical function that learns from the
training data.

Dr Rashmi Popli,Associate Professor,Department of CE


Applications
• Computer Vision: Developing algorithms for image recognition, object
detection, and scene understanding.

• Speech Recognition: Creating models that can transcribe spoken language


into text.

• Bioinformatics: Analyzing biological data, such as DNA sequences and protein


structures.

Dr Rashmi Popli,Associate Professor,Department of CE


Example: Predicting House Prices
1. Problem Statement: Imagine you are a real estate agent, and you want to predict the selling prices of houses
based on certain features like the number of bedrooms, square footage, and neighborhood.

2. Dataset: Collect a dataset that includes information about houses, such as:
• Number of bedrooms
• Square footage
• Neighborhood
• Distance to public amenities
• Previous sale prices (target variable)

3. Features and Target Variable:


• Features (X): Number of bedrooms, square footage, neighborhood, distance to public amenities.
• Target Variable (Y): Sale prices of houses.

4. Data Splitting: Divide the dataset into two parts:


• Training set: Used to train the model.
• Testing set: Used to evaluate the model's performance.
Dr Rashmi Popli,Associate Professor,Department of CE
5. Model Selection: Choose a statistical learning algorithm. For this example, let's use a simple
linear regression model. The model will learn a linear relationship between the features and the
target variable.
6. Training the Model: Feed the training data into the linear regression algorithm. The algorithm
adjusts its parameters to minimize the difference between the predicted house prices and the
actual prices in the training set.
7. Model Evaluation: Use the testing set to assess the model's performance. Common metrics
include mean squared error or R-squared, which measure how well the model predicts house prices
on new, unseen data.
8. Prediction: Once the model is trained and evaluated, use it to predict house prices for new
houses or existing ones not included in the training set.
9. Model Improvement: If the model performance is not satisfactory, consider trying more complex
models, adding more features, or fine-tuning hyperparameters.
10. Deployment: Deploy the trained model in a real-world scenario to make predictions on new
data.
• statistical learning helps us automate the process of predicting house prices based on relevant
features, providing a valuable tool for real estate decision-making.

Dr Rashmi Popli,Associate Professor,Department of CE


Ensemble learning
• Ensemble learning combines multiple machine learning models into a
single model.
• The aim is to increase the performance of the model.
• The main principle behind the ensemble model is that a group of
weak learners come together to form a strong learner.
• Bagging aims to decrease variance
• Boosting aims to decrease bias
• Stacking aims to improve prediction accuracy.

Dr Rashmi Popli,Associate Professor,Department of CE


Dr Rashmi Popli,Associate Professor,Department of CE
Difference between Bagging and Boosting
• Bagging: Training a bunch of individual models in a parallel way. Each
model is trained by a random subset of the data.

• Boosting: Training a bunch of individual models in a sequential way.


Each individual model learns from mistakes made by the previous
model.

Dr Rashmi Popli,Associate Professor,Department of CE


Dr Rashmi Popli,Associate Professor,Department of CE
Dr Rashmi Popli,Associate Professor,Department of CE
Bagging
• In bagging, multiple models (like decision trees) are trained independently on
different random subsets of data.
• Their predictions are then averaged (for regression) or voted on (for classification).
Classroom Example:
• Imagine a class where students are struggling with math.
• The teacher divides the class into several small study groups, each with a different
tutor.
• Each tutor gives randomly selected questions from the syllabus to their group.
• After practice, each tutor gives their own prediction of how well a student will
perform in the exam.
• The final grade is decided by taking the average (for scores) or majority voting (for
pass/fail).
Advantage
• Bagging reduces variance and prevents overfitting by combining multiple
independent models.
Dr Rashmi Popli,Associate Professor,Department of CE
Bagging (bootstrap aggregating)
• Bagging is a machine learning ensemble algorithm designed to
improve the accuracy of machine learning algorithms used in
classification and regression.
• It also reduces variance and helps to avoid overfitting.
• We can choose randomly sub sets from training data with
replacement.
• As a result get average of predictions for each bag.

Dr Rashmi Popli,Associate Professor,Department of CE


Working
• Bootstrap Sampling: Random subsets of the training data are created
by sampling with replacement. This means that some data points may
be selected multiple times while others may not be selected at all.
• Model Training: A base learning algorithm (e.g., decision trees) is
trained on each bootstrap sample independently, resulting in multiple
base models.
• Prediction Aggregation: When making predictions, each base model
produces its own output. In regression problems, the final prediction
is often the average of these outputs, while in classification problems,
it may be the majority vote among the predictions.

Dr Rashmi Popli,Associate Professor,Department of CE


Dr Rashmi Popli,Associate Professor,Department of CE
We have chosen two sub set (bag) and trained.

Dr Rashmi Popli,Associate Professor,Department of CE


Boosting – "Weak Students, Focused Tutoring
• Boosting builds models sequentially, where each new model corrects the mistakes of
the previous ones.
• Models are weighted based on their accuracy.
Classroom Example:
• A teacher notices that some students are weak in algebra.
• She first gives a general test to the class.
• She then focuses on the students who performed poorly and gives them extra lessons.
• Another test is taken, but now the teacher focuses more on the mistakes from the
previous test.
• This process continues, refining the knowledge of weaker students.
• In the end, all students take a final test, and their combined performance determines
the class's success.

Boosting improves accuracy by sequentially training models and giving more importance
to misclassified data.
Dr Rashmi Popli,Associate Professor,Department of CE
Boosting (Ada boost)
• For boosting-we build our first bag of data with select randomly from
training data and train model in a usual way.
• Next is take all our training data and use it to test the model.
• We will discover that some of the points are not well predicted
(significant error).
• For second bag we choose randomly data again but each instance is
weighted according to this error.
• Now we test our system altogether and combine their outputs and
again we measure error across all this data.
• Thus we build next bag and so on..

Dr Rashmi Popli,Associate Professor,Department of CE


Working
• Iterative Training: Boosting starts by training a base learner on the original
data. Then, it sequentially trains additional base learners, each one
focusing more on the examples that the previous learners misclassified or
struggled with.
• Weighted Training Data: At each iteration, the training data is reweighted
so that examples that were misclassified by previous models receive more
weight, making them more likely to be correctly classified in subsequent
iterations.
• Combining Predictions: The final prediction is usually made by combining
the predictions of all base learners, often through a weighted sum, where
each learner's contribution is weighted based on its performance.

Dr Rashmi Popli,Associate Professor,Department of CE


Stacking – "Multiple Subject Experts Working
Together
• Stacking combines different types of models (like decision trees, logistic regression, and SVM).
• A meta-model learns how to combine their outputs for the best final prediction.
Classroom Example:
• A student wants to prepare for a math competition.
• He learns from three different teachers, each specializing in a different approach:
• One focuses on problem-solving techniques.
• Another emphasizes logical reasoning.
• The third teaches speed and accuracy tricks.
• The student listens to all three and then goes to a final coach, who decides how to use each
teacher’s advice best.
• The final coach combines the knowledge from all the teachers and helps the student develop
a winning strategy.
• Stacking leverages multiple different models and combines their outputs for a better final
prediction.

Dr Rashmi Popli,Associate Professor,Department of CE


Random Forest – "A Group of Independent
Teachers Voting
• Random Forest is an extension of Bagging, where multiple Decision Trees
are trained on random subsets of the data and features.
• The final prediction is made using majority voting (classification) or
averaging (regression).
• Classroom Example: "Multiple Teachers Giving Independent Opinions“
• Scenario: Imagine a class where students are preparing for a final exam.
Instead of relying on a single teacher, they consult multiple teachers, each
of whom: Gets a different random subset of students (sampling with
replacement).
• Teaches a different subset of the syllabus (random feature selection).Gives
an independent opinion on each student’s expected grade.
• Finally, the students' predicted performance is decided based on majority
voting (classification) or average scores (regression).

Dr Rashmi Popli,Associate Professor,Department of CE


Why Random Forest Works Well?
• Since each teacher (decision tree) is trained on random questions
and students, they won't overfit to a single student’s performance.
• The final decision is made by considering the wisdom of all teachers,
making it more robust and accurate than relying on a single teacher.

Dr Rashmi Popli,Associate Professor,Department of CE


Random Forests
• Random Forests is a specific type of ensemble learning method that
combines Bagging with decision tree classifiers.
• It builds multiple decision trees during training and merges their
predictions through averaging (for regression) or voting (for
classification).

Dr Rashmi Popli,Associate Professor,Department of CE


Working
• Random Feature Selection: At each split in the decision tree, only a
random subset of features is considered for splitting. This introduces
randomness and helps to decorrelate the trees.
• Bootstrap Sampling: Like Bagging, Random Forests use bootstrap
sampling to create multiple subsets of the training data for training
each decision tree.
• Voting or Averaging: In classification tasks, the final prediction is
determined by a majority vote among the predictions of all decision
trees. In regression tasks, it's often the average of the predictions.

Dr Rashmi Popli,Associate Professor,Department of CE


Advantages
• Improved Accuracy: Ensemble methods often outperform individual base learners by
combining multiple models' predictions. This is because they can capture different
aspects of the data and learn from different perspectives, leading to more accurate
overall predictions.
• Reduced Overfitting: By combining multiple models trained on different subsets of the
data or with different algorithms, ensemble methods can reduce overfitting. Overfitting
occurs when a model learns noise or patterns specific to the training data but doesn't
generalize well to unseen data. Ensemble methods help mitigate this by averaging out
individual models' errors and focusing on the common patterns in the data.
• Robustness: Ensemble methods are more robust to noisy data or outliers compared to
individual models. Since they rely on multiple models' predictions, they can handle
anomalies better and provide more stable predictions.
• Handle Non-linear Relationships: Ensemble methods, especially those based on decision
trees like Random Forests and Gradient Boosting Machines (GBM), are capable of
capturing complex non-linear relationships in the data. They do this by combining
multiple decision trees or weak learners, allowing them to model intricate patterns
effectively.

Dr Rashmi Popli,Associate Professor,Department of CE


• Feature Importance: Ensemble methods can provide insights into feature
importance or variable importance, which can be valuable for feature
selection and understanding the data's underlying structure. For example,
Random Forests can measure how much each feature contributes to
reducing impurity in the decision trees, providing a ranking of feature
importance.
• Scalability: Many ensemble methods, such as Random Forests and Bagging,
are highly scalable and can efficiently handle large datasets with high
dimensionality. They can be parallelized and distributed, making them
suitable for big data applications.
• Versatility: Ensemble methods are versatile and can be applied to various
machine learning tasks, including classification, regression, and anomaly
detection. They can also be combined with other machine learning
techniques, such as deep learning, to further improve performance.
• Easy Implementation: Implementing ensemble methods is relatively
straightforward, especially with popular libraries like scikit-learn in Python.
Many ensemble algorithms are readily available, making it easy for
practitioners to experiment with different techniques and adapt them to
their specific problems.
Dr Rashmi Popli,Associate Professor,Department of CE

You might also like