0% found this document useful (0 votes)
12 views10 pages

Classification

This document discusses classification using K-Nearest Neighbors (KNN). KNN is a supervised learning algorithm that can be used for classification problems. The learning process involves storing all training data points, and the prediction process involves finding the K nearest neighbors of a new data point and predicting its class based on the majority class of those neighbors. Model evaluation metrics include the confusion matrix, accuracy score, recall, and precision. Accuracy score measures the percentage of correct predictions made by the model.

Uploaded by

jemai.mohamedaze
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views10 pages

Classification

This document discusses classification using K-Nearest Neighbors (KNN). KNN is a supervised learning algorithm that can be used for classification problems. The learning process involves storing all training data points, and the prediction process involves finding the K nearest neighbors of a new data point and predicting its class based on the majority class of those neighbors. Model evaluation metrics include the confusion matrix, accuracy score, recall, and precision. Accuracy score measures the percentage of correct predictions made by the model.

Uploaded by

jemai.mohamedaze
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Artificial Intelligence

Chapter : Classification
Marouane Ben Haj Ayech
Outline
• Presentation
• KNN
• Learning process
• Prediction process
• Evaluation

2
Presentation
• Prediction
Prediction task Description Output Nature Examples

Assigning data points to predefined - Email spam classification


categories or classes based on their Discrete categories (spam or not spam).
Classification
features, typically used for supervised or labels - Image classification (cat,
learning. dog, car, etc.).

• Learning
Learning Type Dataset Type Prediction Tasks Learning models
K-Nearest Neighbors (KNN)
Naïve Bayes
Supervised Labeled Classification Decision Tree
Logistic Regression
Réseau de neurones

3
Presentation
Classification problem
x=house=(surface , nb rooms) y=class label ∈ {0 = ′cheap′,′expensive′}
input output

Prediction of class for a


Labeled training dataset
new house
classe cheap Model classe cheap
classe expensive Model
surface surface classe expensive
surface

Learning Prediction
process process

nb rooms nb rooms
nb rooms

4
K Nearest Neighbors (KNN)
Technique Learning Process Prediction Process Hyperparameters
- Given a new data point x, KNN computes
the distances between all training data
a non-parametric - K : The number of K
points and x
technique nearest neighbors
- The points are sorted based on their
KNN does’nt learn a - Distance metric
KNN distances (ascending sort)
model (Euclidean, …)
- x takes the dominant class label of the set
of K nearest neighbors
Model
- The training dataset

5
KNN
• Learning process

Model of classifier
classe cheap Model
classe expensive classe cheap
surface surface classe expensive

nb rooms nb rooms
KNN
• Prediction process K=3

Model classe cheap


classe expensive
surface surface K nearest
surface surface
new new new neighbors
house employee house

nb rooms nb rooms nb rooms nb rooms

surface new
house Major class
is cheap

nb rooms
Evaluation
• The evaluation is performed using a test dataset that has a .
• most used metrics to evaluate the model (classifier) performance are:
• Confusion matrix
• Accuracy score
• Recall
• Precision

• In binary classification, we have at first to define :


• " Negative" class
• " Positive" class

• In our example of houses :


• “cheap" is the "negative" class
• “expensive" is the "positive" class
Evaluation
• Confusion matrix :
• It is a matrix composed of :
• True Negatives (TN): The number of “cheap" houses correctly classified as “cheap"
• False Positives (FP): The number of “cheap" houses incorrectly classified as “expensive"
• False Negatives (FN): The number of “expensive" houses incorrectly classified as “cheap“
• True Positives (TP): The number of “expensive" houses correctly classified as “expensive"

Predicted

cheap (N) expensive (P)


Actual Negatives :
cheap (N) TN FP TN+FP
Actual
Actual Positives :
expensive (P) FN TP FN+TP

Predicted Negatives : Predicted Positives :


TN + FN FP+TP
Evaluation
• Accuracy score :
• It is a measure of how often the classifier correctly predicts both “cheap" and
“expensive" houses.
• Formula : (TP + TN) / (TP + TN + FP + FN)

Example Predicted

cheap (N) expensive (P)


Actual Negatives : Accuracy
cheap (N) 8 2 TN+FP = 10
= (8+6)/(2+8+4+6)
Actual = 14/20
Actual Positives :
expensive (P) 4 6 FN+TP = 10 =70%

Predicted Negatives : Predicted Positives :


TN + FN = 12 FP+TP = 8

You might also like