Unit 3

Uploaded by

Aakash Bhat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views17 pages

Unit 3

Uploaded by

Aakash Bhat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

Machine Learning

UNIT-3

Dr Rashmi Popli
Associate Professor
Department of Computer Engineering

Dr Rashmi Popli
Evaluating Machine Learning
Algorithms
Evaluating machine learning algorithms and selecting the right model are crucial
steps in building effective and accurate predictive models.
1. Define Objectives and Metrics
• Clearly define the problem you are trying to solve and the goals of your model.
• Choose appropriate evaluation metrics based on the nature of the problem
(e.g., accuracy, precision, recall, F1-score, mean squared error, R-squared for
regression).
2. Data Preprocessing
• Handle missing data, outliers, and perform data normalization or
standardization as needed.
• Encode categorical variables and handle any other data-specific preprocessing
steps.
Dr Rashmi Popli
3. Split Data
• Split your dataset into training and testing sets to evaluate model
performance on unseen data.
• Optionally, use techniques like cross-validation for more robust evaluation.
4. Choose Algorithms
• Select a set of candidate algorithms based on the nature of the problem
(e.g., decision trees, support vector machines, neural networks).
• Consider the characteristics of your data (size, complexity) when choosing
algorithms.
5. Train Models
• Train each selected algorithm on the training data.
• Use hyperparameter tuning to find the optimal settings for each algorithm.

Dr Rashmi Popli
6. Evaluate Performance:
• Use the testing set to evaluate the performance of each model.
• Compare performance metrics and choose the model(s) with the best
results.
7. Consider Cross-Validation:
• Perform cross-validation to assess how well the models generalize to
different subsets of the data.
• This helps in identifying models that are less prone to overfitting.
8. Model Interpretability:
• Consider the interpretability of the models, especially if the application
requires understanding the reasoning behind predictions.
9. Ensemble Methods:
• Explore ensemble methods (e.g., random forests, gradient boosting) to
combine the strengths of multiple models for improved performance.
Dr Rashmi Popli
10. Deployment Considerations
• Consider practical aspects of deploying the model, such as computational
resources, real-time requirements, and interpretability.
11. Iterative Process
• Model evaluation and selection are iterative processes. If initial results
are not satisfactory, revisit previous steps, adjust hyperparameters, or try
different algorithms.
12. Documentation
• Document your choices, decisions, and results to facilitate
communication with stakeholders and for future reference.

Dr Rashmi Popli
Hyper parameter Tuning
• Hyper parameter tuning is the process of selecting the best set of
hyper parameters for a machine learning algorithm.
• Hyper parameters are configuration settings that are external to the
model itself and cannot be directly learned from the training data.
• Examples of hyperparameters include the the depth of a decision
tree, the number of hidden layers in a neural network, and the
regularization parameter in regression.
• Hyperparameter tuning aims to find the optimal combination of
hyperparameters that maximizes the performance of the model on a
validation dataset or through cross-validation.

Dr Rashmi Popli
Cross-validation
• Cross-validation is a technique used in machine learning to assess how
well a model will generalize to an independent dataset.
• It's particularly useful when the dataset is limited, as it allows
maximizing the amount of data used for both training and testing, thus
providing a more reliable estimate of the model's performance.
• Cross-validation helps to provide a more accurate estimate of a model's
performance compared to a single train-test split because it uses
multiple splits of the data.
• This helps to reduce the variability in performance estimates that may
arise from using a single train-test split, especially when the dataset is
small or the data distribution is heterogeneous.
Dr Rashmi Popli
Statistical learning Theory
• Statistics is the mathematical study of data.
• It is a field of study within machine learning that deals with the
theoretical foundations of learning algorithms.
• Statistical learning theory is a framework that draws from the fields
of statistics and functional analysis.
• Using statistics, an interpretable statistical model is created to describe
the data, and this model can then be used to infer something about
the data or even to predict values that are not present in the sample
data used to create the model.
• Objective: The main goal of statistical learning is understanding and
prediction. It aims to build models that can draw meaningful
conclusions from data and make accurate predictions.
Dr Rashmi Popli
Categories of Learning
• Supervised Learning: In this category, we have labeled data pairs
(input-output pairs), and the goal is to learn a mapping from inputs to
outputs.
• Unsupervised Learning: Here, we work with unlabeled data and aim to
discover underlying patterns or structures.
• Semi-supervised Learning: Combines aspects of both supervised and
unsupervised learning.
• Online Learning: Learning from data streams in real-time.
• Reinforcement Learning: Learning through interaction with an
environment.

Dr Rashmi Popli
Components of Statistical
Learning
• Features (Predictors): The variables used to make predictions.
• Response (Outcome): The variable to be predicted.
• Training Data: The dataset used to train the model.
• Model: The algorithm or mathematical function that learns from the
training data.

Dr Rashmi Popli
Applications
• Computer Vision: Developing algorithms for image recognition, object
detection, and scene understanding.

• Speech Recognition: Creating models that can transcribe spoken language

into text.

• Bioinformatics: Analyzing biological data, such as DNA sequences and protein

structures.

Dr Rashmi Popli
Example: Predicting House Prices
1. Problem Statement: Imagine you are a real estate agent, and you want to predict the selling prices of
houses based on certain features like the number of bedrooms, square footage, and neighborhood.

2. Dataset: Collect a dataset that includes information about houses, such as:
• Number of bedrooms
• Square footage
• Neighborhood
• Distance to public amenities
• Previous sale prices (target variable)

3. Features and Target Variable:

• Features (X): Number of bedrooms, square footage, neighborhood, distance to public amenities.
• Target Variable (Y): Sale prices of houses.

4. Data Splitting: Divide the dataset into two parts:

• Training set: Used to train the model.
• Testing set: Used to evaluate the model's performance.
Dr Rashmi Popli
5. Model Selection: Choose a statistical learning algorithm. For this example, let's use a simple
linear regression model. The model will learn a linear relationship between the features and the
target variable.
6. Training the Model: Feed the training data into the linear regression algorithm. The algorithm
adjusts its parameters to minimize the difference between the predicted house prices and the
actual prices in the training set.
7. Model Evaluation: Use the testing set to assess the model's performance. Common metrics
include mean squared error or R-squared, which measure how well the model predicts house prices
on new, unseen data.
8. Prediction: Once the model is trained and evaluated, use it to predict house prices for new
houses or existing ones not included in the training set.
9. Model Improvement: If the model performance is not satisfactory, consider trying more complex
models, adding more features, or fine-tuning hyperparameters.
10. Deployment: Deploy the trained model in a real-world scenario to make predictions on new
data.
• statistical learning helps us automate the process of predicting house prices based on relevant
features, providing a valuable tool for real estate decision-making.

Dr Rashmi Popli
Ensemble methods
• They combine multiple models to improve predictive performance.
Here are three common ensemble methods:
• Bagging (Bootstrap Aggregating): Uses multiple base models (often
decision trees) trained on different subsets of the data and averages
their predictions.
• Random Forests: An extension of bagging that builds multiple
decision trees and combines their predictions.
• Boosting
: Iteratively builds weak models (e.g., decision trees) and combines th
em to create a strong model
1
Dr Rashmi Popli
Difference between Bagging and
Boosting
• Bagging: Training a bunch of individual models in a parallel way. Each
model is trained by a random subset of the data.

• Boosting: Training a bunch of individual models in a sequential way.

Each individual model learns from mistakes made by the previous
model.

Dr Rashmi Popli
Dr Rashmi Popli
Dr Rashmi Popli

Introduction To Machine Learning PPT Main
100% (1)
Introduction To Machine Learning PPT Main
15 pages
Screening, Size Reduction, Flotation, Agitation
67% (3)
Screening, Size Reduction, Flotation, Agitation
496 pages
Acer Aspire v5-572p Quanta ZQK DAOZQKMB8E0 Rev1A Schematic
100% (1)
Acer Aspire v5-572p Quanta ZQK DAOZQKMB8E0 Rev1A Schematic
46 pages
Stock Market Prediction Using MLP and Random Forest
No ratings yet
Stock Market Prediction Using MLP and Random Forest
18 pages
Unit 3
No ratings yet
Unit 3
37 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
10 pages
NOTES: Fundamentales of Machine Learning: Vocabulary
No ratings yet
NOTES: Fundamentales of Machine Learning: Vocabulary
4 pages
Introduction and Basics of Machine Learning
No ratings yet
Introduction and Basics of Machine Learning
9 pages
Machine Learning
No ratings yet
Machine Learning
24 pages
Lecture 2
No ratings yet
Lecture 2
36 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
Unit 5 Intro To Machine Learning
No ratings yet
Unit 5 Intro To Machine Learning
25 pages
AIMl TA2
No ratings yet
AIMl TA2
4 pages
ML Unit 1
No ratings yet
ML Unit 1
21 pages
MACHINE LEARNING 1-5 (Ai &DS)
100% (1)
MACHINE LEARNING 1-5 (Ai &DS)
60 pages
ML in Everyday Life
No ratings yet
ML in Everyday Life
28 pages
CSC413 Lecture Note
No ratings yet
CSC413 Lecture Note
32 pages
Lecture 5 - Feature Extraction, Model Building & Evaluation
No ratings yet
Lecture 5 - Feature Extraction, Model Building & Evaluation
35 pages
Aiya Session 4
No ratings yet
Aiya Session 4
42 pages
ML Unit-3
No ratings yet
ML Unit-3
28 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
8 pages
APS1070 Lecture (3) Slides
No ratings yet
APS1070 Lecture (3) Slides
70 pages
SDL Unit 1
No ratings yet
SDL Unit 1
7 pages
Introduction Class
No ratings yet
Introduction Class
134 pages
PR & ML: CS5691: Machine Learning
No ratings yet
PR & ML: CS5691: Machine Learning
42 pages
Lecture 8
No ratings yet
Lecture 8
11 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
AIML-Unit 5 Notes-Assignment 5
No ratings yet
AIML-Unit 5 Notes-Assignment 5
24 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
8 pages
Presentation On Supervised Learning
No ratings yet
Presentation On Supervised Learning
8 pages
ML Fundamentals
No ratings yet
ML Fundamentals
15 pages
ML Module 1
No ratings yet
ML Module 1
12 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
ML COMPLETE (Pure Sem Ka)
No ratings yet
ML COMPLETE (Pure Sem Ka)
347 pages
Unit V Aiml
No ratings yet
Unit V Aiml
18 pages
Presenttion 33
No ratings yet
Presenttion 33
2 pages
ML Notes-1
No ratings yet
ML Notes-1
59 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
5 pages
Supervised Learning Final With Diagrams Cleaned
No ratings yet
Supervised Learning Final With Diagrams Cleaned
7 pages
MCS224 Dec 2024 Solved
No ratings yet
MCS224 Dec 2024 Solved
22 pages
THEORY FILE - Machine Learning (6th Sem) !!
No ratings yet
THEORY FILE - Machine Learning (6th Sem) !!
26 pages
Module 4
No ratings yet
Module 4
28 pages
ML Chapter 3
No ratings yet
ML Chapter 3
25 pages
ML 7th Sem AIML ITE Notes Complete LONG (1) - 10-33
No ratings yet
ML 7th Sem AIML ITE Notes Complete LONG (1) - 10-33
24 pages
ML Notes
No ratings yet
ML Notes
16 pages
Applying Machine Learning: Presented By: Mohamed Naas
No ratings yet
Applying Machine Learning: Presented By: Mohamed Naas
28 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
15 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
20 pages
01 - Introduction
No ratings yet
01 - Introduction
35 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Machine Learning
No ratings yet
Machine Learning
42 pages
Guidebook Machine Learning Basics PDF
100% (1)
Guidebook Machine Learning Basics PDF
16 pages
ML - Part - A
No ratings yet
ML - Part - A
10 pages
MLSC Final Notes
No ratings yet
MLSC Final Notes
24 pages
Mod8 DM
No ratings yet
Mod8 DM
13 pages
ML Workshop
No ratings yet
ML Workshop
78 pages
Predictive Unit 1
No ratings yet
Predictive Unit 1
22 pages
ML Terms
No ratings yet
ML Terms
15 pages
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
From Everand
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
Elaine Tate
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Mastering Machine Learning: A Comprehensive Guide to Success
From Everand
Mastering Machine Learning: A Comprehensive Guide to Success
Rick Spair
No ratings yet
Machine Learning Based Integrated Scheduling and Rescheduling For Elective and Emergency Patients in The Operating Theatre
No ratings yet
Machine Learning Based Integrated Scheduling and Rescheduling For Elective and Emergency Patients in The Operating Theatre
24 pages
Fermiones, Bosones
No ratings yet
Fermiones, Bosones
10 pages
B.Sc. II Year (Biotechnology), NEP (Session - 2024-25) Paper I (Major)
No ratings yet
B.Sc. II Year (Biotechnology), NEP (Session - 2024-25) Paper I (Major)
25 pages
FID1 A, FID1A, Front Signal (2019/20190527 - PPNF2/20170619 - VINCI P12 2019-05-27 13-13-31/033F0201.D)
No ratings yet
FID1 A, FID1A, Front Signal (2019/20190527 - PPNF2/20170619 - VINCI P12 2019-05-27 13-13-31/033F0201.D)
2 pages
Automata Theory: José Anastacio Hernández Saldaña
No ratings yet
Automata Theory: José Anastacio Hernández Saldaña
4 pages
LD7538 LeadtrendTechnology
No ratings yet
LD7538 LeadtrendTechnology
17 pages
ADA Lab Manual 21cs42
No ratings yet
ADA Lab Manual 21cs42
17 pages
Gigamon Gigavue VM Virtual Machine 4022
No ratings yet
Gigamon Gigavue VM Virtual Machine 4022
7 pages
TEXA Axone Nemo Specs
No ratings yet
TEXA Axone Nemo Specs
36 pages
M1L3 LN
No ratings yet
M1L3 LN
7 pages
Mpu 82510
No ratings yet
Mpu 82510
16 pages
Year 4 Statistics and Probability Assessment
No ratings yet
Year 4 Statistics and Probability Assessment
9 pages
Polyflow BMTF WS04 Bottle Blow Molding
No ratings yet
Polyflow BMTF WS04 Bottle Blow Molding
34 pages
F-Rock Mass Classification
No ratings yet
F-Rock Mass Classification
9 pages
Liebert Apm 30 600 KW Brochure English
No ratings yet
Liebert Apm 30 600 KW Brochure English
8 pages
3200AMMe - Part 4
100% (1)
3200AMMe - Part 4
207 pages
LC-10 LOAD CELL Trainer PDF
No ratings yet
LC-10 LOAD CELL Trainer PDF
10 pages
5 Miras 2021 MATLAB Tool For Fractal Cortex
No ratings yet
5 Miras 2021 MATLAB Tool For Fractal Cortex
9 pages
Chapter 8: Analysis Setup: Setting Up Loading Conditions Formatting Models For Analysis
No ratings yet
Chapter 8: Analysis Setup: Setting Up Loading Conditions Formatting Models For Analysis
17 pages
Suhas Padawe
100% (1)
Suhas Padawe
40 pages
Crotch-Grained Chess Table: Walnut, Poplar
100% (2)
Crotch-Grained Chess Table: Walnut, Poplar
5 pages
Wind Farm Control: The Route To Bankability
No ratings yet
Wind Farm Control: The Route To Bankability
27 pages
Experiment-2 Name of Student: Waghule Shubham Kalyan Batch: B3 Branch: CS-D Roll No.: 94 Problem Statement
100% (2)
Experiment-2 Name of Student: Waghule Shubham Kalyan Batch: B3 Branch: CS-D Roll No.: 94 Problem Statement
3 pages
Mathematics Notes
No ratings yet
Mathematics Notes
6 pages
IS 802 (Part 1 Sec 2) 2016 Use of Structural Steel in Overhead Transmission Line Towers - Part 1 Section 2 Design Strengths
No ratings yet
IS 802 (Part 1 Sec 2) 2016 Use of Structural Steel in Overhead Transmission Line Towers - Part 1 Section 2 Design Strengths
19 pages
1 s2.0 S0020768323001865 Main
No ratings yet
1 s2.0 S0020768323001865 Main
16 pages
Isometry: 5.1 Isometry and Isometric Isomorphism
No ratings yet
Isometry: 5.1 Isometry and Isometric Isomorphism
13 pages

Unit 3

Uploaded by

Unit 3

Uploaded by

Machine Learning

• Speech Recognition: Creating models that can transcribe spoken language

• Bioinformatics: Analyzing biological data, such as DNA sequences and protein

3. Features and Target Variable:

4. Data Splitting: Divide the dataset into two parts:

• Boosting: Training a bunch of individual models in a sequential way.

You might also like