ML Mod1

Uploaded by

Aishwarya Balaji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views48 pages

ML Mod1

Uploaded by

Aishwarya Balaji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

Module 1

Classification
Contents
Supervised Learning (Classification):
• Support Vector Machine (SVC and SVR),
• Loss function in SVM,
• Kernel Methods,
• Random Forest, and
• Ensemble classification methods (Bagging and Boosting Techniques).
Introduction
There are data points that have to be classified.

Linearly separable Non-linearly separable

A 2-dimensional synthetic dataset that contains 2 classes

• When data is linearly separable, then there exist multiple
hyperplanes.

Logistic regression draws the

multiple hyperplanes.

Among these hyperplanes, which is

the most effective hyperplane?
Answer is Support Vector Machine (SVM)
• Support Vector Machine or SVM is one of the most popular Supervised
Learning algorithms invented in 1994, which is used for Classification
as well as Regression problems. However, primarily, it is used for
Classification problems in Machine Learning.
• The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes so that
we can easily put the new data point in the correct category in the
future. This best decision boundary is called a hyperplane.
How SVM works

Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary or
region is called as a hyperplane. SVM algorithm finds the closest point of the lines from both classes.
These points are called support vectors. The distance between the vectors and the hyperplane is called
as margin. And the goal of SVM is to maximize this margin. The hyperplane with maximum margin is
called the optimal hyperplane.
Hard margin SVM
• Each data point must lie on the correct side of the margin and there
should be no misclassification. Hard margin works well only if our data is
linearly separable.
• Hard margin SVM does not allow any misclassification to happen.
• Suppose if the data is nonlinear the Hard margin SVM will not return any
hyperplane as it will not be able to separate the data.

• Therefore we need soft-margin SVM.

Hard margin
Soft margin SVM
• Soft margin SVM allows some misclassification to happen by relaxing the
hard constraints of Support Vector Machine.
• Soft margin SVM is implemented with the help of the Regularization
parameter (C).
• Regularization parameter (C): It tells us how much misclassification
should be avoided.
– If C value is large, then SVM tries to reduce the number of misclassified
points.
- If C value is small, then SVM misclassifies the points more.
• Gamma parameter: This does not have any role if the
data is linearly separable. Gamma is used only for
nonlinear data points only.

Gamma value can be small or large

The gamma value is small (0.1) The gamma value is large (1)
Kernels
Ensemble method
• The main idea behind ensemble methods is that a group of “weak
learners” can come together to form a “strong learner”.

Bagging
Ensemble Learning: Bagging and Boosting
Bagging
Random Forest
• Random Forest is a supervised learning technique for Classification
and Regression problems in ML and is a bagging technique.
• It is based on the concept of ensemble learning, which is a process
of combining multiple classifiers to solve a complex problem and improve
the model’s performance.
• "Random Forest is a classifier that contains several decision
trees on various subsets of the given dataset, based on the
majority votes of predictions, and it predicts the final output.
• The greater number of trees in the forest leads to higher accuracy
and prevents the problem of overfitting.
Random forest with and without
replacement
• With replacement: When a sampling unit is drawn from a
population and is returned to that population after its
characteristics have been recorded before the next unit is drawn.
• So we might end up selecting and measuring the same unit more
than once.
Random forest with and without
replacement
• Without replacement: When a sampling unit is drawn from a
population and is not returned to that population before the next
unit is drawn.

Sampling without replacement is always good.

Out of bag error
2
𝑇𝑟𝑎𝑖𝑛𝑖𝑛𝑔 = ×𝑑
3

1
𝑉𝑎𝑙𝑖𝑑𝑎𝑡𝑖𝑜𝑛 = × 𝑑
3

OOB error and this value

will be in terms of
percentage and is called
oob_score
Important Hyperparameters in
Random Forest
Hyperparameters to Increase the Predictive Power
• n_estimators: Number of trees the algorithm builds before averaging
the predictions.
• max_features: Maximum number of features random forest considers
splitting a node.
• mini_sample_leaf: Determines the minimum number of leaves
required to split an internal node.
• criterion: How to split the node in each tree? (Entropy/Gini
impurity/Log Loss)
• max_leaf_nodes: Maximum leaf nodes in each tree
Hyperparameters to Increase the Speed
• n_jobs: it tells the engine how many processors it is allowed to use.
If the value is 1, it can use only one processor, but if the value is -
1, there is no limit.
• random_state: controls randomness of the sample. The model will
always produce the same results if it has a definite value of random
state and has been given the same hyperparameters and training
data.
• oob_score: OOB means out of the bag. It is a random forest cross-
validation method. In this, one-third of the sample is not used to
train the data; instead used to evaluate its performance. These
samples are called out-of-bag samples.
Hands-on Random forest
Boosting
Bias ( training and testing errors will be high ). Boosting is used
to reduce the bias.
1. Initialize the dataset and assign equal weight to each of the data points.
2. Provide this as input to the model and identify the wrongly classified data points.
3. Increase the weight of the wrongly classified data points.
4. if (got required results)
Goto step 5
else
Goto step 2
5. End

The principle behind boosting algorithms is

to first build a model on the training
dataset and then build a second model to
rectify the errors present in the first model.
This procedure is continued until and
unless the errors are minimized and the
dataset is predicted correctly.
Types of boosting
• Gradient boosting
• XGBoost
• Adaboost
• Catboost
Adaboost
• AdaBoost algorithm, Adaptive Boosting, is a Boosting
technique used as an Ensemble Method in Machine
Learning. It is called Adaptive Boosting as the weights are re-
assigned to each instance, with higher weights assigned to
incorrectly classified instances.
• Towards the end, aggregate these models by taking the
weighted average of the individual models.
Numerical example on Adaboost
• Refer the class notes
• Gradient Boosting
• XGBoost (Extreme Gradient Boosting)
• Catboost (it works with categorical data (the Cat) and it
uses gradient boosting (the Boost)).
• Practical Implementation of Boosting
Question Bank
• Explain the working principle of SVM and write a Python
program to implement SVM.
• Explain the different types of Kernel methods.
• Problems on Linear SVM and Non-linear SVM (Refer class
notes).
• Explain the working principle of Random Forest and write a
Python program to implement Random Forest.
• Explain the working mechanism of Ada boost with an example.
• write a Python program to implement Gradient and Adaboost.
• write a Python program to implement XGBoost and Catboost.
Question Bank
• Write a Python program to implement LIME and SHAP for
interpreting the results.
• Write a Python program to hyperparameter using manual
method.
• Write a Python program to implement Randomized search CV.
• Write a Python program to implement Grid search CV.

Ai For Everyone Andrew NG 190818125324 PDF
No ratings yet
Ai For Everyone Andrew NG 190818125324 PDF
19 pages
Handout9 Trees Bagging Boosting
100% (1)
Handout9 Trees Bagging Boosting
23 pages
Bagging and Random Forest Presentation1
100% (3)
Bagging and Random Forest Presentation1
23 pages
13 PracticalMachineLearning
100% (1)
13 PracticalMachineLearning
84 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
No ratings yet
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
27 pages
Lesson 8 - Ensemble Learning
No ratings yet
Lesson 8 - Ensemble Learning
61 pages
Week 7 - Tree-Based Model
100% (1)
Week 7 - Tree-Based Model
8 pages
Enseble LEarning
100% (1)
Enseble LEarning
57 pages
ML Unit-3
No ratings yet
ML Unit-3
28 pages
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
Lecture 05 Random Forest 07112022 124639pm
No ratings yet
Lecture 05 Random Forest 07112022 124639pm
25 pages
Random Forest
No ratings yet
Random Forest
25 pages
Ens Embling
No ratings yet
Ens Embling
19 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
100% (1)
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
6 pages
Speech Emotion Recognition: Submitted by Manoj Rajput 2019PEC5303
No ratings yet
Speech Emotion Recognition: Submitted by Manoj Rajput 2019PEC5303
11 pages
Random Forest
No ratings yet
Random Forest
20 pages
Lab 1 ML 414
No ratings yet
Lab 1 ML 414
5 pages
ML QB Solutionss
No ratings yet
ML QB Solutionss
16 pages
AI Chapter 3 Part 3
No ratings yet
AI Chapter 3 Part 3
49 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
Lecture 6
No ratings yet
Lecture 6
24 pages
SVM Unit3
No ratings yet
SVM Unit3
23 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
40 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
Upload 2
No ratings yet
Upload 2
48 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
UNIT-3 Notes
No ratings yet
UNIT-3 Notes
12 pages
ML-Unit I - Ensemble Methods
No ratings yet
ML-Unit I - Ensemble Methods
54 pages
Module 2
No ratings yet
Module 2
34 pages
ML Mod 5.1
No ratings yet
ML Mod 5.1
18 pages
Bagging
No ratings yet
Bagging
6 pages
ML Unit-3
No ratings yet
ML Unit-3
15 pages
22AIP3101A Session 11
No ratings yet
22AIP3101A Session 11
30 pages
Unit 3
No ratings yet
Unit 3
63 pages
Age and Gender Prediction From Face Images Using Attentional Convolutional Network
No ratings yet
Age and Gender Prediction From Face Images Using Attentional Convolutional Network
6 pages
Ensemble Methods
No ratings yet
Ensemble Methods
32 pages
Unit 3
No ratings yet
Unit 3
59 pages
Hyper Parameter Turning
No ratings yet
Hyper Parameter Turning
4 pages
Ensemble Learning
No ratings yet
Ensemble Learning
16 pages
Image Classification (Project Introducing)
No ratings yet
Image Classification (Project Introducing)
8 pages
Xor in C#
No ratings yet
Xor in C#
3 pages
Data Analytics With Cognos Questions
No ratings yet
Data Analytics With Cognos Questions
15 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
2020 Deep CNN TR Le
No ratings yet
2020 Deep CNN TR Le
6 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
Traffic Sign Recognition Project
No ratings yet
Traffic Sign Recognition Project
9 pages
Image-Based Tomato Disease Identification Using Convolutional Neural Network
No ratings yet
Image-Based Tomato Disease Identification Using Convolutional Neural Network
7 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
ML Unit-3 Part-1
No ratings yet
ML Unit-3 Part-1
17 pages
WASTE DETECTION IN POMERANIA Paper
No ratings yet
WASTE DETECTION IN POMERANIA Paper
19 pages
06 Chapter 4 - Machine Learning
No ratings yet
06 Chapter 4 - Machine Learning
55 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
Computer Network PPT Module 1
No ratings yet
Computer Network PPT Module 1
32 pages
Module 5,1 Ensemble - Bagging, RF, Boosting
No ratings yet
Module 5,1 Ensemble - Bagging, RF, Boosting
66 pages
Random Forest-Supervised ML
No ratings yet
Random Forest-Supervised ML
45 pages
Understanding Attention Mechanisms in Deep Learning
No ratings yet
Understanding Attention Mechanisms in Deep Learning
104 pages
Module 1-AI
No ratings yet
Module 1-AI
79 pages
Lec 01 (ML) Introduction
No ratings yet
Lec 01 (ML) Introduction
98 pages
Early Diagnosis of Parkinsons Disease Using Deep Learning and Average Voting Cla
No ratings yet
Early Diagnosis of Parkinsons Disease Using Deep Learning and Average Voting Cla
6 pages
DSUP Exp6
No ratings yet
DSUP Exp6
5 pages
10 35377-Saucis 1418505-3655169
No ratings yet
10 35377-Saucis 1418505-3655169
16 pages
Unit 3 Aml
No ratings yet
Unit 3 Aml
9 pages
Associative Memory Network
No ratings yet
Associative Memory Network
63 pages
CN Mod 1
No ratings yet
CN Mod 1
15 pages
ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
Bagging Vs Boosting - Javatpoint
No ratings yet
Bagging Vs Boosting - Javatpoint
8 pages
Novel Transfer Learning Approach For Driver Drowsiness Detection Using Eye Movement Behavior
No ratings yet
Novel Transfer Learning Approach For Driver Drowsiness Detection Using Eye Movement Behavior
14 pages
The Development of Language AI Models in 2018
No ratings yet
The Development of Language AI Models in 2018
5 pages
Image Animations On Driving Videos
No ratings yet
Image Animations On Driving Videos
6 pages
IX-Unit-2-WorksheetAnswerKey-AI Project Cycle
No ratings yet
IX-Unit-2-WorksheetAnswerKey-AI Project Cycle
4 pages
Experiment 3.3
No ratings yet
Experiment 3.3
3 pages
A Hybrid Model of Roberta and Bidirectional Gru For Enhanced Sentiment Analysis
No ratings yet
A Hybrid Model of Roberta and Bidirectional Gru For Enhanced Sentiment Analysis
6 pages
Arya Dadhich: City, State
No ratings yet
Arya Dadhich: City, State
1 page
Unit I 2
No ratings yet
Unit I 2
78 pages
D3 IT Random Forest Apr 2023
No ratings yet
D3 IT Random Forest Apr 2023
32 pages
Lung Cancer Prediction Using Machine Learning
No ratings yet
Lung Cancer Prediction Using Machine Learning
6 pages
Post Independence and World History
No ratings yet
Post Independence and World History
57 pages
Prediction of Remaining Useful Life Using Neural Networks
No ratings yet
Prediction of Remaining Useful Life Using Neural Networks
3 pages
ML Module 3
No ratings yet
ML Module 3
44 pages
World History
No ratings yet
World History
59 pages
AI ML 2024 Solved Question Paper - Vaibhavpandit - Tele - 250522 - 224429
No ratings yet
AI ML 2024 Solved Question Paper - Vaibhavpandit - Tele - 250522 - 224429
41 pages
ML Unit 3 (DS)
No ratings yet
ML Unit 3 (DS)
31 pages
Ensemble Learning (Autosaved)
No ratings yet
Ensemble Learning (Autosaved)
31 pages
Unit 3 Ds
No ratings yet
Unit 3 Ds
10 pages
Bagging
No ratings yet
Bagging
7 pages
Eda - M4
No ratings yet
Eda - M4
7 pages
Study Material Brochure
No ratings yet
Study Material Brochure
7 pages

ML Mod1

Uploaded by

ML Mod1

Uploaded by

Module 1

Linearly separable Non-linearly separable

A 2-dimensional synthetic dataset that contains 2 classes

Logistic regression draws the

Among these hyperplanes, which is

• Therefore we need soft-margin SVM.

Gamma value can be small or large

Sampling without replacement is always good.

OOB error and this value

The principle behind boosting algorithms is

You might also like