0% found this document useful (0 votes)
49 views

Tutorial 7 Machine Learning Algorithms

This document discusses various machine learning algorithms for classification and regression problems. It covers logistic regression, naive Bayes, K-nearest neighbors (KNN), support vector machines (SVM), and classification and regression trees (CART). These algorithms can be applied to problems involving predicting categorical (classification) or continuous (regression) target variables based on input features. The document recommends evaluating multiple algorithms on a dataset to determine the best performing one for that specific problem.

Uploaded by

Wenbo Pan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

Tutorial 7 Machine Learning Algorithms

This document discusses various machine learning algorithms for classification and regression problems. It covers logistic regression, naive Bayes, K-nearest neighbors (KNN), support vector machines (SVM), and classification and regression trees (CART). These algorithms can be applied to problems involving predicting categorical (classification) or continuous (regression) target variables based on input features. The document recommends evaluating multiple algorithms on a dataset to determine the best performing one for that specific problem.

Uploaded by

Wenbo Pan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Machine learning algorithms

Instructor: Dr. Ting Sun


Why apply multiple algorithms ?
you never know which algorithm gives you the best prediction.
use trial and error to discover good or even best algorithms for your
dataset
That means, you evaluate a diverse set of algorithms on your dataset
and choose the best one, which gives you the best predictive
performance
Part 1: Classification algorithms
Note: certain algorithms discussed here (e.g., KNN and SVM) work for both
classification and regression problems so they are regression algorithms as well
Logistic regression (GLM)
For binary classification problems only
Logistic regression
• A technique borrowed by machine
learning from the field of statistics
• Go-to method for binary classification
problem
• Find a relationship between independent
variables and probability of particular
outcome (spam or not spam)
Logistic regression
• With binary classification, let ‘x’ be the input value of the
independent variable and ‘y’ be the output of the dependent variable
which can be either 0 or 1.
The probability that the output is 1 given its input can be represented
as:
P (y=1| x)

• We predict P via a logit function:

where, the left hand side is called the logit or log-odds function, and p(x)/(1-p(x)) is called odds.
Naïve Bayes
Classification problems only
Naïve Bayes
• One of the most popular machine learning classification algorithm
• Based on the Bayes Theorem with an assumption of independence
among independent variables for calculating probabilities and
conditional probabilities
• In simple terms, a Naive Bayes model assumes that the presence of a
particular feature in a class is unrelated to the presence of any other
feature.
Naïve Bayes

Where,
P(c|x) is the probability of class (c, dependent variable) given the data of the independent variable (x,
predictor). This is called the posterior probability of c.
P(c) is the probability of class regardless of the data. This is called the prior probability of c.
P(x|c) is the probability of the data of the independent variable given class.
P(x) is the prior probability of the data of the independent variable
KNN
K-Nearest Neighbors Algorithm
For both classification and regression problems
KNN
The KNN algorithm assumes that similar
things exist in close proximity. In other
words, similar things are near to each other.

“Birds of a feather flock together.”


Image showing how similar data points
typically exist close to each other
KNN
• KNN makes predictions using the training dataset directly.
• Predictions are made for a new observation (x) by searching through
the entire training set for the K most similar observation (the
neighbors) and summarizing the output variable for those K
observations.
• How to summarize (taking the majority vote)?
• the mean , if it is a regression problem
• the mode (or most common) class value, if it is a classification problem
• The value for K can be found by algorithm tuning. It is a good idea to
try many different values for K (e.g. values from 1 to 21) and see what
works best for your problem.
SVM
Support Vector Machine
For both classification and regression problems
SVM
• The objective of the support vector machine algorithm is to find a
hyperplane in an N-dimensional space(N — the number of features)
that distinctly classifies the data points.
Hyperplanes and maximum margin
• To separate the two classes of data
points, there are many possible
hyperplanes that could be chosen.
• Our objective is to find a plane that
has the maximum margin, i.e the
maximum distance between data
points of both classes.
• Maximizing the margin distance • Hyperplanes are decision boundaries
provides some reinforcement so that that help classify the data points.
future data points can be classified • Data points falling on either side of the
with more confidence. hyperplane can be attributed to
different classes.
• the dimension of the hyperplane
depends upon the number of
features. If the number of input
features is 2, then the hyperplane is
just a line.
• If the number of input features is 3,
then the hyperplane becomes a
three-dimensional plane.
• It becomes difficult to imagine when
the number of features exceeds 3.
Support Vectors
• Support vectors are data points that are closer to the hyperplane and
influence the position and orientation of the hyperplane.
• Using these support vectors, we maximize the margin of the classifier.
Deleting the support vectors will change the position of the
hyperplane.
SVM: using kernel
the hyperplane of a two dimensional space below is a one dimensional
line dividing the red and blue dots.

Adapted from https://towardsdatascience.com/kernel-function-6f1d2be6091


Classification and regression
trees (CART)
For both classification and regression problems
Decision trees: CART
• Classification and Regression Trees or CART for short is a term
introduced by Leo Breiman to refer to Decision Tree algorithms that
can be used for classification or regression predictive modeling
problems.
Basic idea
• The representation for the CART model is a binary tree.
Given a dataset with two inputs (x) of height in
centimeters and weight in kilograms, the output of
sex as male or female, this is a crude example of a
binary decision tree (completely fictitious for
demonstration purposes only).

The developed tree can be stored to file as a graph


or a set of rules. For example, below is the above
decision tree as a set of rules.
1 If Height > 180 cm Then Male
2 If Height <= 180 cm AND Weight > 80 kg Then
Male
3 If Height <= 180 cm AND Weight <= 80 kg Then
Female
4 Make Predictions With CART Models
Part 2: Regression algorithms
Linear regression
For regression problem only
Linear regression
• Regression models a target prediction value based on independent
variables.
• Linear regression performs the task to predict a dependent variable
value (y) based on a given independent variable (x). So, this
regression technique finds out a linear relationship between x (input)
and y(output). Hence, the name is Linear Regression.

OLS: ordinary least squares


Regularized Regression
Regularized Regression
• linear regression is a simple and fundamental approach. Moreover, when the
assumptions required by ordinary least squares (OLS) regression are met, the
coefficients produced by OLS are unbiased and, of all unbiased linear techniques,
have the lowest variance.
• However, in today's world, data sets being analyzed typically have a large amount
of features. As the number of features grow, our OLS assumptions typically break
down and our models often overfit to the training sample, causing our out of
sample error to increase. This problem is called overfitting
• Regularization methods provide a means to control our regression coefficients,
which can reduce the variance and decrease our of sample error.
• the coefficient estimates are constrained to zero. The magnitude (size) of coefficients, as well
as the magnitude of the error term, are penalized (using alpha and lambda as regularization
parameters). Complex models are discouraged, primarily to avoid overfitting.
KNN
Both regression and classification problems
Discussed in previous slides
SVM
Both regression and classification problems
Discussed in previous slides
CART
Both regression and classification problems
Discussed in previous slides

You might also like