Tutorial 7 Machine Learning Algorithms

This document discusses various machine learning algorithms for classification and regression problems. It covers logistic regression, naive Bayes, K-nearest neighbors (KNN), support vector machines (SVM), and classification and regression trees (CART). These algorithms can be applied to problems involving predicting categorical (classification) or continuous (regression) target variables based on input features. The document recommends evaluating multiple algorithms on a dataset to determine the best performing one for that specific problem.

Uploaded by

Wenbo Pan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views30 pages

Tutorial 7 Machine Learning Algorithms

Uploaded by

Wenbo Pan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Machine learning algorithms

Instructor: Dr. Ting Sun

Why apply multiple algorithms ?
you never know which algorithm gives you the best prediction.
use trial and error to discover good or even best algorithms for your
dataset
That means, you evaluate a diverse set of algorithms on your dataset
and choose the best one, which gives you the best predictive
performance
Part 1: Classification algorithms
Note: certain algorithms discussed here (e.g., KNN and SVM) work for both
classification and regression problems so they are regression algorithms as well
Logistic regression (GLM)
For binary classification problems only
Logistic regression
• A technique borrowed by machine
learning from the field of statistics
• Go-to method for binary classification
problem
• Find a relationship between independent
variables and probability of particular
outcome (spam or not spam)
Logistic regression
• With binary classification, let ‘x’ be the input value of the
independent variable and ‘y’ be the output of the dependent variable
which can be either 0 or 1.
The probability that the output is 1 given its input can be represented
as:
P (y=1| x)

• We predict P via a logit function:

where, the left hand side is called the logit or log-odds function, and p(x)/(1-p(x)) is called odds.
Naïve Bayes
Classification problems only
Naïve Bayes
• One of the most popular machine learning classification algorithm
• Based on the Bayes Theorem with an assumption of independence
among independent variables for calculating probabilities and
conditional probabilities
• In simple terms, a Naive Bayes model assumes that the presence of a
particular feature in a class is unrelated to the presence of any other
feature.
Naïve Bayes

Where,
P(c|x) is the probability of class (c, dependent variable) given the data of the independent variable (x,
predictor). This is called the posterior probability of c.
P(c) is the probability of class regardless of the data. This is called the prior probability of c.
P(x|c) is the probability of the data of the independent variable given class.
P(x) is the prior probability of the data of the independent variable
KNN
K-Nearest Neighbors Algorithm
For both classification and regression problems
KNN
The KNN algorithm assumes that similar
things exist in close proximity. In other
words, similar things are near to each other.

“Birds of a feather flock together.”

Image showing how similar data points
typically exist close to each other
KNN
• KNN makes predictions using the training dataset directly.
• Predictions are made for a new observation (x) by searching through
the entire training set for the K most similar observation (the
neighbors) and summarizing the output variable for those K
observations.
• How to summarize (taking the majority vote)?
• the mean , if it is a regression problem
• the mode (or most common) class value, if it is a classification problem
• The value for K can be found by algorithm tuning. It is a good idea to
try many different values for K (e.g. values from 1 to 21) and see what
works best for your problem.
SVM
Support Vector Machine
For both classification and regression problems
SVM
• The objective of the support vector machine algorithm is to find a
hyperplane in an N-dimensional space(N — the number of features)
that distinctly classifies the data points.
Hyperplanes and maximum margin
• To separate the two classes of data
points, there are many possible
hyperplanes that could be chosen.
• Our objective is to find a plane that
has the maximum margin, i.e the
maximum distance between data
points of both classes.
• Maximizing the margin distance • Hyperplanes are decision boundaries
provides some reinforcement so that that help classify the data points.
future data points can be classified • Data points falling on either side of the
with more confidence. hyperplane can be attributed to
different classes.
• the dimension of the hyperplane
depends upon the number of
features. If the number of input
features is 2, then the hyperplane is
just a line.
• If the number of input features is 3,
then the hyperplane becomes a
three-dimensional plane.
• It becomes difficult to imagine when
the number of features exceeds 3.
Support Vectors
• Support vectors are data points that are closer to the hyperplane and
influence the position and orientation of the hyperplane.
• Using these support vectors, we maximize the margin of the classifier.
Deleting the support vectors will change the position of the
hyperplane.
SVM: using kernel
the hyperplane of a two dimensional space below is a one dimensional
line dividing the red and blue dots.

Adapted from https://towardsdatascience.com/kernel-function-6f1d2be6091

Classification and regression
trees (CART)
For both classification and regression problems
Decision trees: CART
• Classification and Regression Trees or CART for short is a term
introduced by Leo Breiman to refer to Decision Tree algorithms that
can be used for classification or regression predictive modeling
problems.
Basic idea
• The representation for the CART model is a binary tree.
Given a dataset with two inputs (x) of height in
centimeters and weight in kilograms, the output of
sex as male or female, this is a crude example of a
binary decision tree (completely fictitious for
demonstration purposes only).

The developed tree can be stored to file as a graph

or a set of rules. For example, below is the above
decision tree as a set of rules.
1 If Height > 180 cm Then Male
2 If Height <= 180 cm AND Weight > 80 kg Then
Male
3 If Height <= 180 cm AND Weight <= 80 kg Then
Female
4 Make Predictions With CART Models
Part 2: Regression algorithms
Linear regression
For regression problem only
Linear regression
• Regression models a target prediction value based on independent
variables.
• Linear regression performs the task to predict a dependent variable
value (y) based on a given independent variable (x). So, this
regression technique finds out a linear relationship between x (input)
and y(output). Hence, the name is Linear Regression.

OLS: ordinary least squares

Regularized Regression
Regularized Regression
• linear regression is a simple and fundamental approach. Moreover, when the
assumptions required by ordinary least squares (OLS) regression are met, the
coefficients produced by OLS are unbiased and, of all unbiased linear techniques,
have the lowest variance.
• However, in today's world, data sets being analyzed typically have a large amount
of features. As the number of features grow, our OLS assumptions typically break
down and our models often overfit to the training sample, causing our out of
sample error to increase. This problem is called overfitting
• Regularization methods provide a means to control our regression coefficients,
which can reduce the variance and decrease our of sample error.
• the coefficient estimates are constrained to zero. The magnitude (size) of coefficients, as well
as the magnitude of the error term, are penalized (using alpha and lambda as regularization
parameters). Complex models are discouraged, primarily to avoid overfitting.
KNN
Both regression and classification problems
Discussed in previous slides
SVM
Both regression and classification problems
Discussed in previous slides
CART
Both regression and classification problems
Discussed in previous slides

Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
100% (1)
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
13 pages
Unit-Ii Chapter-3 Beyond Binary Classification Handling More Than Two Classes
No ratings yet
Unit-Ii Chapter-3 Beyond Binary Classification Handling More Than Two Classes
16 pages
Chapter Four -Part One
No ratings yet
Chapter Four -Part One
44 pages
Machine Learning Algorithms Laiki
No ratings yet
Machine Learning Algorithms Laiki
123 pages
Unit 5
No ratings yet
Unit 5
73 pages
Class 3 - Classification
No ratings yet
Class 3 - Classification
80 pages
INT354 - Unit 3
No ratings yet
INT354 - Unit 3
60 pages
Supervised ML
No ratings yet
Supervised ML
69 pages
Machine Learning Algorithms
No ratings yet
Machine Learning Algorithms
9 pages
Classification
No ratings yet
Classification
74 pages
Classification (NaiveBayes KNN SVM DecisionTrees)
No ratings yet
Classification (NaiveBayes KNN SVM DecisionTrees)
105 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
UNIT3 Machine Learning
No ratings yet
UNIT3 Machine Learning
53 pages
Chapter Four
No ratings yet
Chapter Four
75 pages
ML and Ai Unit 04 and Unit 05
No ratings yet
ML and Ai Unit 04 and Unit 05
58 pages
Chapter 6 ML Classifications
100% (1)
Chapter 6 ML Classifications
51 pages
Data Science Unit 3
No ratings yet
Data Science Unit 3
33 pages
Algorithms 1
No ratings yet
Algorithms 1
23 pages
Understanding Machine Learning Algorithms - in Depth
No ratings yet
Understanding Machine Learning Algorithms - in Depth
167 pages
MLT Unit 2 - Updated
No ratings yet
MLT Unit 2 - Updated
58 pages
Module 3
No ratings yet
Module 3
79 pages
Lecture - 2 & 3
No ratings yet
Lecture - 2 & 3
62 pages
Regression Models: by Mayuri Bhandari
No ratings yet
Regression Models: by Mayuri Bhandari
64 pages
Supervised Learning
No ratings yet
Supervised Learning
46 pages
Machine Learning: Dr. Windhya Rankothge (PHD - Upf, Barcelona)
No ratings yet
Machine Learning: Dr. Windhya Rankothge (PHD - Upf, Barcelona)
44 pages
Ch3 BayesianNetwork Onwards
No ratings yet
Ch3 BayesianNetwork Onwards
5 pages
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
No ratings yet
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
28 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Session 5
No ratings yet
Session 5
36 pages
Machine Learning in A Nutshell
No ratings yet
Machine Learning in A Nutshell
36 pages
PerceptiLabs-ML Handbook
No ratings yet
PerceptiLabs-ML Handbook
31 pages
ICT202B AI ML and Emerging Technologies UNIT 3 (Classification and Regression) 2
No ratings yet
ICT202B AI ML and Emerging Technologies UNIT 3 (Classification and Regression) 2
23 pages
Machine Learning Theory
100% (1)
Machine Learning Theory
12 pages
Unit 2
No ratings yet
Unit 2
133 pages
ML Unit-4
No ratings yet
ML Unit-4
20 pages
Unit 1
No ratings yet
Unit 1
15 pages
Unit 6 Ai
No ratings yet
Unit 6 Ai
28 pages
(English (Auto-Generated) ) All Machine Learning Algorithms Explained in 17 Min (DownSub - Com)
No ratings yet
(English (Auto-Generated) ) All Machine Learning Algorithms Explained in 17 Min (DownSub - Com)
19 pages
ML Unit 3 V1
No ratings yet
ML Unit 3 V1
25 pages
Machine Learning
No ratings yet
Machine Learning
37 pages
Unit 3
No ratings yet
Unit 3
12 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Unit V - Big Data Programming
No ratings yet
Unit V - Big Data Programming
22 pages
UCS551 Chapter 6 - Classification
No ratings yet
UCS551 Chapter 6 - Classification
20 pages
Unit III
No ratings yet
Unit III
5 pages
Interview Preparing - ML Draft
No ratings yet
Interview Preparing - ML Draft
12 pages
Ai Notes For Isa
No ratings yet
Ai Notes For Isa
9 pages
Improvement_of_supply_chain_performance_of_printin
No ratings yet
Improvement_of_supply_chain_performance_of_printin
12 pages
S, SVM, LR
No ratings yet
S, SVM, LR
18 pages
Supervised Learning
No ratings yet
Supervised Learning
6 pages
ML Unit 3 Part B Material
No ratings yet
ML Unit 3 Part B Material
15 pages
Unit Iii
No ratings yet
Unit Iii
18 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
Comparison of Classification Algorithms
No ratings yet
Comparison of Classification Algorithms
11 pages
ECommerce Virtual Assistant Course
100% (1)
ECommerce Virtual Assistant Course
18 pages
Classification
No ratings yet
Classification
7 pages
Classification Algorithms 3rd
No ratings yet
Classification Algorithms 3rd
15 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
41 Machine Learning Algorithms I
No ratings yet
41 Machine Learning Algorithms I
8 pages
Vehicle Collision With Student Pedestrians Crossing in Rochester Indiana NTSB Report
No ratings yet
Vehicle Collision With Student Pedestrians Crossing in Rochester Indiana NTSB Report
70 pages
Machine Learning For Beginners PDF
No ratings yet
Machine Learning For Beginners PDF
29 pages
Man Xtvsuite en
No ratings yet
Man Xtvsuite en
74 pages
Modern Programming Tools and Techniques: DCAP505
No ratings yet
Modern Programming Tools and Techniques: DCAP505
28 pages
Applying Deep Learning To Audit Procedures: An Illustrative Framework
No ratings yet
Applying Deep Learning To Audit Procedures: An Illustrative Framework
22 pages
Tyre Industry in India - Me Project
100% (2)
Tyre Industry in India - Me Project
17 pages
Automata State Elimination Method
No ratings yet
Automata State Elimination Method
3 pages
Visitors Guide. Motril History Museum
No ratings yet
Visitors Guide. Motril History Museum
24 pages
Assignment HBEC4503 Action Research in Early Childhood Education Assignment 2 May 2019 Semester
No ratings yet
Assignment HBEC4503 Action Research in Early Childhood Education Assignment 2 May 2019 Semester
10 pages
Media Ownership in India
No ratings yet
Media Ownership in India
11 pages
HR Interview Questions
No ratings yet
HR Interview Questions
8 pages
Tiếng Anh thầy Tiểu Đạt - chuyên luyện thi Đại học Mr. Tieu Dat's English Academy Thầy Lưu Tiến Đạt (thầy Tiểu Đạt) Chuyên gia luyện thi môn Tiếng Anh
No ratings yet
Tiếng Anh thầy Tiểu Đạt - chuyên luyện thi Đại học Mr. Tieu Dat's English Academy Thầy Lưu Tiến Đạt (thầy Tiểu Đạt) Chuyên gia luyện thi môn Tiếng Anh
5 pages
04-Random-Variate Generation
No ratings yet
04-Random-Variate Generation
18 pages
Hitch Climbers Guide
No ratings yet
Hitch Climbers Guide
28 pages
The Need of MEMS
No ratings yet
The Need of MEMS
22 pages
Simulation Thickener
No ratings yet
Simulation Thickener
11 pages
2023 Reports - Luminate On Diversity
No ratings yet
2023 Reports - Luminate On Diversity
28 pages
High Pass Filter
No ratings yet
High Pass Filter
12 pages
Service Manual: Viewsonic Pjd6211
No ratings yet
Service Manual: Viewsonic Pjd6211
60 pages
Inspection of The Building Signature by Pinnacle.: (Estructure and Electromechanic Equipment Surveying.)
No ratings yet
Inspection of The Building Signature by Pinnacle.: (Estructure and Electromechanic Equipment Surveying.)
12 pages
The Machine Learning Audit Andrew Clark
No ratings yet
The Machine Learning Audit Andrew Clark
30 pages
Izadian Leila Thesis 2021
No ratings yet
Izadian Leila Thesis 2021
39 pages
Tutorial 6 Evaluation Metrics For Machine Learning Models: Classification and Regression Models
No ratings yet
Tutorial 6 Evaluation Metrics For Machine Learning Models: Classification and Regression Models
22 pages
Week 4 - DL and Conference Call Paper
No ratings yet
Week 4 - DL and Conference Call Paper
41 pages
Adcps: Question Paper Cum Answer Sheet
No ratings yet
Adcps: Question Paper Cum Answer Sheet
5 pages
Week 2: Machine Learning Intro: Instructor: Ting Sun
No ratings yet
Week 2: Machine Learning Intro: Instructor: Ting Sun
21 pages
Worksheet - Chapter 11 - Biotechnology - Principles and Processes
No ratings yet
Worksheet - Chapter 11 - Biotechnology - Principles and Processes
3 pages
Mste610l Repair-And-Rehabilitation-Of-Structures TH 1.0 68 Mste610l
No ratings yet
Mste610l Repair-And-Rehabilitation-Of-Structures TH 1.0 68 Mste610l
2 pages
Paystubs Williams Dewey 2025 03 24 1743635623
No ratings yet
Paystubs Williams Dewey 2025 03 24 1743635623
1 page
Weekly Assessment in Science
No ratings yet
Weekly Assessment in Science
1 page
The Importance of Corporate Communications During Financial Crisis
No ratings yet
The Importance of Corporate Communications During Financial Crisis
12 pages
Mechatronics Project: Linear Displacement Indicator
No ratings yet
Mechatronics Project: Linear Displacement Indicator
6 pages
13 Council of Student Organizations: Minutes of The Meeting
No ratings yet
13 Council of Student Organizations: Minutes of The Meeting
4 pages

Tutorial 7 Machine Learning Algorithms

Uploaded by

Tutorial 7 Machine Learning Algorithms

Uploaded by

Machine learning algorithms

Instructor: Dr. Ting Sun

• We predict P via a logit function:

“Birds of a feather flock together.”

Adapted from https://towardsdatascience.com/kernel-function-6f1d2be6091

The developed tree can be stored to file as a graph

OLS: ordinary least squares

You might also like