0% found this document useful (0 votes)
46 views74 pages

2.introduction To Supervised Learning and K Nearest Neighbors

The document provides an introduction to supervised learning, including classification and regression problems. It discusses supervised learning concepts like training data, models, and making predictions. Specific algorithms like KNN are also mentioned.

Uploaded by

Ehab Emam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views74 pages

2.introduction To Supervised Learning and K Nearest Neighbors

The document provides an introduction to supervised learning, including classification and regression problems. It discusses supervised learning concepts like training data, models, and making predictions. Specific algorithms like KNN are also mentioned.

Uploaded by

Ehab Emam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 74

Introduction to

Supervised Learning
Learning
•Objectives
Explain supervised learning and how it can be applied to
regression and classification problems

• Apply K-Nearest Neighbor (KNN) algorithm for


classification

• Apply Intel® Extension for Scikit-learn to leverage


underlying compute capabilities of hardware
What is Machine
Learning?

Machine learning
allows computers to
learn and infer from
data.
Machine Learning in Our Daily
Lives
Spam Web Search Postal Mail
Filtering Routing

Fraud Detection
Movie Vehicle Driver
Recommendations

Assistance

Web Advertisements Social Networks Speech


Recognition
Machine Learning in Our Daily
Lives
Spam Web Postal Mail
Filtering Search Routing

Fraud Detection
Movie Vehicle Driver
Recommendations

Assistance

Web Advertisements Social Networks Speech


Recognition
Machine Learning in Our Daily
Lives
Spam Web Postal Mail
Filtering Search Routing

Fraud Detection
Movie Vehicle Driver
Recommendations

Assistance

Web Advertisements Social Networks Speech


Recognition
Machine Learning in Our Daily
Lives
Spam Web Postal Mail
Filtering Search Routing

Movie Vehicle
Fraud
Recommendatio Driver
Detection
ns Assistance

Web Social Speech


Advertisements Networks Recognition
Types of Machine
Learning

Supervise data points have known


d outcome

Unsupervised data points have unknown


outcome
Types of Machine
Learning

Supervise data points have known


d outcome

Unsupervise data points have unknown


d outcome
Types of Machine
Learning

Supervise data points have known


d outcome

Unsupervise data points have unknown


d outcome
Types of Supervised
Learning

Regressi outcome is continuous


on (numerical)

Classification outcome is a
category
Types of Supervised
Learning

Regressi outcome is continuous


on (numerical)

Classificatio outcome is a
n category
Supervised Learning
Overview

data with fit


+ mode model
answers
l

data without predic predicte


+ model t d
answers
answers
Regression: Numeric
Answers

movie fit
data with + mode model
revenue l

movie data predic predicte


(unknown + model t d
revenue) revenue
Classification: Categorical
Answers

labele fit
+ mode model
d
l
data

unlabeled data predic


+ model t label
s
Classification: Categorical
Answers

emails labeled fit


+ model
as spam/not
model
spam

unlabeled predic spam or


+ model t
emails not
spam
Types of classifiers
• We can divide the large variety of classification approaches into roughly two main types

1.Instance based classifiers


- Use observation directly (no models)
- e.g. K nearest neighbors

2.Generative:
- build a generative statistical model
- e.g., Naïve Bayes

3.Discriminative
- directly estimate a decision rule/boundary
- e.g., decision tree
Classification
• Assume we want to teach a computer to distinguish between cats and
dogs …

Several steps:
1. feature transformation
2. Model / classifier
specification
3. Model /
classifier
estimation (with
regularization)
4. feature
selection
Classification
• Assume we want to teach a computer to distinguish between cats and
dogs …

Several steps:
1. feature transformation
2. Model / classifier
specification
3. Model / classifier
estimation (with
regularization)
4. feature selection

How do we encode the picture?


• A collection of pixels?
• Do we use the entire image or a subset? …
Classification
• Assume we want to teach a computer to distinguish between cats and dogs …

Several steps:
1. feature transformation
2. Model / classifier
specification
3. Model / classifier
estimation (with
regularization)
4. feature selection

What type of classifier should we use?


Classification
• Assume we want to teach a computer to distinguish between cats and dogs …

Several steps:
1. feature transformation
2. Model / classifier
specification
3. Model / classifier estimation
(with regularization)
4. feature selection

• How do we learn the parameters of our classifier?


• Do we have enough examples to learn a good model?
Classification
• Assume we want to teach a computer to distinguish between cats and dogs …

Several steps:
1. feature transformation
2. Model / classifier
specification
3. Model / classifier estimation
(with regularization)
4. feature selection

• Do we really need all the features?


• Can we use a smaller number and still achieve the same (or
better) results?
Supervised learning
• Classification is one of the key components of ‘supervised learning’
• Unlike other learning paradigms, in supervised learning
• the teacher (us) provides the algorithm with the solutions to some of the
instances
• The goal is to generalize so that a model / method can be used to
determine the labels of the unobserved samples

Classifier
X Y
w1, w2 …

X,
Y
teacher
Types of classifiers
• We can divide the large variety of classification approaches into roughly two main types

1.Instance based classifiers


- Use observation directly (no models)
- e.g. K nearest neighbors

2.Generative:
- build a generative statistical model
- e.g., Bayesian networks

3.Discriminative
- directly estimate a decision rule/boundary
- e.g., decision tree
Machine Learning
Vocabulary
• Target: predicted category or value of the
data
(column to predict)
• Features: properties of the data used for
prediction
(non-target columns)
• Example: a single data point within the data (one
row)
• Label: the target value for a single data point
Machine Learning
Vocabulary
sepal length sepal width petal length petal width species

6.7 3.0 5.2 2.3 virginica

6.4 2.8 5.6 2.1 virginica

4.6 3.4 1.4 0.3 setosa

6.9 3.1 4.9 1.5 versicolor

4.4 2.9 1.4 0.2 setosa

4.8 3.0 1.4 0.1 setosa

5.9 3.0 5.1 1.8 virginica

5.4 3.9 1.3 0.4 setosa

4.9 3.0 1.4 0.2 setosa

5.4 3.4 1.7 0.2 setosa


Machine Learning
Vocabulary
sepal length sepal width petal length petal width species

6.7 3.0 5.2 2.3 virginica

6.4 2.8 5.6 2.1 virginica

Targe 4.6 3.4 1.4 0.3 setosa

6.9 3.1 4.9 1.5 versicolor


t 4.4 2.9 1.4 0.2 setosa

4.8 3.0 1.4 0.1 setosa

5.9 3.0 5.1 1.8 virginica

5.4 3.9 1.3 0.4 setosa

4.9 3.0 1.4 0.2 setosa

5.4 3.4 1.7 0.2 setosa


Machine Learning
Vocabulary
• Target: predicted category or value of the data
(column to predict)
• Features: properties of the data used for
prediction
(non-target columns)
• Example: a single data point within the data
(one
row)
• Label: the target value for a single data point
Machine Learning
Vocabulary
sepal length sepal width petal length petal width species

6.7 3.0 5.2 2.3 virginica

6.4 2.8 5.6 2.1 virginica

Feature 4.6 3.4 1.4 0.3 setosa

6.9 3.1 4.9 1.5 versicolor


s 4.4 2.9 1.4 0.2 setosa

4.8 3.0 1.4 0.1 setosa

5.9 3.0 5.1 1.8 virginica

5.4 3.9 1.3 0.4 setosa

4.9 3.0 1.4 0.2 setosa

5.4 3.4 1.7 0.2 setosa


Machine Learning
Vocabulary
• Target: predicted category or value of the data
(column to predict)
• Features: properties of the data used for
prediction
(non-target columns)
• Example: a single data point within the data (one
row)
• Label: the target value for a single data
point
Machine Learning
Vocabulary
sepal length sepal width petal length petal width species

6.7 3.0 5.2 2.3 virginica

6.4 2.8 5.6 2.1 virginica

Exampl 4.6 3.4 1.4 0.3 setosa

e
6.9 3.1 4.9 1.5 versicolor

4.4 2.9 1.4 0.2 setosa

4.8 3.0 1.4 0.1 setosa

5.9 3.0 5.1 1.8 virginica

5.4 3.9 1.3 0.4 setosa

4.9 3.0 1.4 0.2 setosa

5.4 3.4 1.7 0.2 setosa


Machine Learning
Vocabulary
• Target: predicted category or value of the data
(column to predict)
• Features: properties of the data used for
prediction
(non-target columns)
• Example: a single data point within the data (one
row)
• Label: the target value for a single data point
Machine Learning
Vocabulary
sepal length sepal width petal length petal width species

6.7 3.0 5.2 2.3 virginica

6.4 2.8 5.6 2.1 virginica

Labe 4.6 3.4 1.4 0.3 setosa

l
6.9 3.1 4.9 1.5 versicolor

4.4 2.9 1.4 0.2 setosa

4.8 3.0 1.4 0.1 setosa

5.9 3.0 5.1 1.8 virginica

5.4 3.9 1.3 0.4 setosa

4.9 3.0 1.4 0.2 setosa

5.4 3.4 1.7 0.2 setosa


K – Nearest
Neighbors
KNN
 K-Nearest Neighbors (KNN)
 Simple, but a very powerful classification algorithm
 Classifies based on a similarity measure
 Non-parametric
 Lazy learning
🞑 Does not “learn” until the test example is given
🞑 Whenever we have a new data to classify, we find its K-nearest
neighbors from the training data
KNN: Classification Approach

 Classified by “MAJORITY VOTES” for its neighbor classes


🞑 Assigned
to the most common class amongst its K- nearest neighbors (by
measuring “distant” between data)
What is
Classification?
A flower shop wants to guess a
customer's purchase from
similarity to most recent
purchase.
What is
Classification?
Which flower is a customer
most likely to purchase based
on similarity to previous
purchase?

?
What is
Classification?
Which flower is a customer
most likely to purchase based
on similarity to previous
purchase?

?
What is
Classification?
Which flower is a customer
most likely to purchase based
on similarity to previous
purchase?

?
What is
Classification?
Which flower is a customer
most likely to purchase based
on similarity to previous
purchase?

?
What is Needed for
Classification?
• Model data with:
• Features that can be
quantitated
• Labels that are
known
• Method to measure
similarity
What is Needed for
Classification?
• Model data with:
• Features that can be
quantitated
• Labels that are known
• Method to measure
similarity
What is Needed for
Classification?
• Model data with:
• Features that can be
quantitated
• Labels that are known
• Method to measure similarity
KNN: Pseudocode
KNN: Example
K nearest neighbors (KNN)
• Need to determine an appropriates
value for k
• What happens if we chose k=1?
• What if k=3?
?
Euclidean
Distance

Ag
e

Number of Malignant
Nodes
Euclidean Distance (L2
Distance)

d
∆ Age
Ag
e

∆ Nodes
𝑑=
∆𝑁𝑜𝑑𝑒𝑠2 + ∆𝐴𝑔𝑒2

Number of Malignant Nodes


Manhattan Distance (L1 or City Block
Distance)

∆ Age
Ag
e

∆ Nodes
𝑑=
∆𝑁𝑜𝑑𝑒𝑠 +

∆𝐴𝑔𝑒
KNN: Euclidean distance matrix
Decision Boundaries
 Voronoi diagram
🞑 Describes the areas that are nearest to any given point, given a set of data.
🞑 Each line segment is equidistant between two points of opposite class

https://www.youtube.com/watch?v=j2c3kumwoAk
Decision Boundaries

• KNN creates local models (or neighborhoods) across the feature space with
each space defined by a subset of the training data.
• Implicitly a ‘global’ decision space is created with boundaries between the
training data.
Decision Boundaries
 With large number of examples and possible noise in the labels,
the decision boundary can become nasty!
🞑 “Overfitting” problem
Effect of K
 Larger k produces smoother boundary effect
 When K==N, always predict the majority class
Discussion
 Which model is better between K=1 and
K=15?
 Why?
How to choose
k?Empirically optimal

k?
Feature scaling
 Standardize the range of independent variables (features of data)
 A.k.a Normalization or Standardization
Standardization
 Standardization or Z-score normalization
🞑 Rescale the data so that the mean is zero and the standard
deviation from the mean (standard scores) is one

x−𝜇
x𝑛 𝑜 𝑟 𝑚 = 𝜎
𝜇 is mean, 𝜎 is a standard deviation from the
mean (standard score)
Min-Max scaling

 Scale the data to a fixed range – between 0 and


1
x − xm i n
xn orm =
x max − x min
Comparison of Feature Scaling
Methods
• Standard Scaler: mean center data and scale to unit
variance
• Minimum-Maximum Scaler: scale data to fixed range
(usually 0–1)
• Maximum Absolute Value Scaler: scale maximum absolute
value  (it divides every observation by the maximum value of the variable: The
result of the preceding transformation is a distribution in which the values vary
approximately within the range of -1 to 1.)
Feature Scaling: The
Syntax
Import the class containing the scaling method
from sklearn.preprocessing import StandardScaler

Create an instance of the class


StdSc = StandardScaler()

Fit the scaling parameters and then transform the data


StdSc = StdSc.fit(X_data)
X_scaled = KNN.transform(X_data)

Other scaling methods exist: MaxAbsScaler,


MinMaxScaler.
Feature Scaling: The
Syntax
Import the class containing the scaling method
from sklearn.preprocessing import StandardScaler

Create an instance of the class


StdSc = StandardScaler()

Fit the scaling parameters and then transform the data


StdSc = StdSc.fit(X_data)
X_scaled = KNN.transform(X_data)

Other scaling methods exist: MaxAbsScaler,


MinMaxScaler.
Feature Scaling: The
Syntax
Import the class containing the scaling method
from sklearn.preprocessing import StandardScaler

Create an instance of the class


StdSc = StandardScaler()

Fit the scaling parameters and then transform the


data
StdSc = StdSc.fit(X_data)
X_scaled = StdSc.transform(X_data)
Other scaling methods exist: MaxAbsScaler,
MinMaxScaler.
Feature Scaling: The
Syntax
Import the class containing the scaling method
from sklearn.preprocessing import StandardScaler // Scikit-learn

Create an instance of the class


StdSc = StandardScaler()

Fit the scaling parameters and then transform the data


StdSc = StdSc.fit(X_data)
X_scaled = StdSc.transform(X_data)

Other scaling methods exist: MinMaxScaler, MaxAbsScaler.


Multiclass KNN Decision
Boundary
K=5
60

Full remission
Partial
remission 40
Did not survive
Age
20

0 10

20
Number of Malignant Nodes
Regression with
KNN
K= K= K=
20 3 1
Pros and Cons
 Pros
🞑 Learning
and implementation is extremely simple and Intuitive
🞑 Flexible decision boundaries
 Cons
🞑 Irrelevant
or correlated features have high impact and must be eliminated
🞑 Typically difficult to handle high dimensionality
🞑 Computational costs: memory and classification time computation
K Nearest Neighbors: The
Syntax
Import the class containing the classification method
from sklearn.neighbors import KNeighborsClassifier

To use the Intel® Extension for Scikit-learn* variant of this


algorithm:
• Install Intel® oneAPI AI Analytics Toolkit (AI Kit)

Add the following two lines of code after the above code:
import patch_sklearn
patch_sklearn()
K Nearest Neighbors: The
Syntax
Import the class containing the classification method
from sklearn.neighbors import KNeighborsClassifier

Create an instance of the class


KNN = KNeighborsClassifier(n_neighbors=3)

Fit the instance on the data and then predict the expected value
KNN = KNN.fit(X_data, y_data)
y_predict = KNN.predict(X_data)

The fit and predict/transform syntax will show up throughout the


course.
K Nearest Neighbors: The
Syntax
Import the class containing the classification method
from sklearn.neighbors import KNeighborsClassifier

Create an instance of the class


KNN = KNeighborsClassifier(n_neighbors=3)

Fit the instance on the data and then predict the


expected value
KNN = KNN.fit(X_data, y_data)
y_predict = KNN.predict(X_data)
The fit and predict/transform syntax will show up throughout the
course.
K Nearest Neighbors: The
Syntax
Import the class containing the classification method
from sklearn.neighbors import KNeighborsClassifier

Create an instance of the class


KNN = KNeighborsClassifier(n_neighbors=3)

Fit the instance on the data and then predict the expected value
KNN = KNN.fit(X_data, y_data)
y_predict = KNN.predict(X_data)

The fit and predict/transform syntax will show up throughout the


course.
K Nearest Neighbors: The
Syntax
Import the class containing the classification method
from sklearn.neighbors import KNeighborsClassifier

Create an instance of the class


KNN = KNeighborsClassifier(n_neighbors=3)

Fit the instance on the data and then predict the


expected value
KNN = KNN.fit(X_data, y_data)
y_predict = KNN.predict(X_data)

Regression can be done with KNeighborsRegressor.


Links
• KNN visualization in just 13 lines of code
• https://towardsdatascience.com/knn-visualization-in-just-13-lines-of
-code-32820d72c6b6
• Datasets: https://www.kaggle.com/deepthiar/toydatasets
• Tutorial:
https://www.datacamp.com/community/tutorials/k-nearest-neighb
or-classification-scikit-learn

• Handwritten Solved Example:


• https://www.youtube.com/watch?v=LqBzNsfXoQU
• https://people.revoledu.com/kardi/tutorial/KNN/KNN_Numeri
cal-example.html

You might also like