0% found this document useful (0 votes)

41 views56 pages

Data Analytics Classification

Classification of data

Uploaded by

Swift Developers

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views56 pages

Data Analytics Classification

Classification of data

Uploaded by

Swift Developers

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

Data Analytics

C LASSIFICATION
Classification: Motivation and Applications Train-Validation Split
and Cross-Validation Evaluation Metrics and Class Imbalance
Overfitting

kNN Classifier
Naive Bayes Classifier Decision Tree

Entroy, Conidtional Entropy, Information Gain

US Classification 1 / 56
Classification: Definition
Classification is a supervised task

Input: A collection of objects Features Class Label

x1 x2 xm y
feature vectors with class labels o1
o2
o.3
Output: A model for the class attribute
train instances
.
feature vectors
oi
as a function of other attributes ..
on
test instances
?

Training Set: Instances whose class labels are used for learning Test Set:
Instances with same attributes as training set but
missing/hidden class labels
Goal: Model should accurately assign class labels to unlabeled instances

US Classification 2 / 56
Classification
Input: A collection of objects Features Class Label
x1 x2 xm y
feature vectors with class labels o1
o2
o.3 train instances
Output: A model for the class attribute .
feature vectors
oi
as a function of other attributes ..
on
test instances ?

source: javapoint.com

US Classification 3 / 56
Classification

US Classification 4 / 56
Classification: Applications
Targeted Advertisement
Enhance marketing by identifying customers who are likely to buy a product
Use customer purchase history, demographics etc. for similar (old) products
buy/no buy as class labels

US Classification 5 / 56
Classification: Applications
Credit Card Fraudulent Transaction Detection
User transactions history and card holders characteristics
fair/fraud as class labels

source: Benchaji et.al. (2019)

US Classification 6 / 56
Classification: Applications
Predict Customer Attrition/Churn

User customers transactions and feedback history

churn/no-churn as class labels

US Classification 7 / 56
Classification: Applications
Text Classification

Text is converted into feature vectors before classification

Document Classification
source: towardsdatascience.com

Sentiment Analysis Emotion Mining

US Classification 8 / 56
Classification: Applications

Sky Survey Cataloging

Classify astronomical objects as stars or galaxies

Use telescoping survey images (from Palomar Observatory) 3000 images
with 23040⇥23040 pixels per image
Extract features values 40 features per object

US Classification 9 / 56
Classification: Applications

US Classification 10 / 56
Classification Evaluation Metrics

Train-Validation Split and Cross-Validation

US Classification 11 / 56
Classification

The model (classifier) is learned by finding patterns in training set

Performance on training set does not (necessarily) indicate
generalization power of the model
A validation set (a subset of training set) is used to learn parameters
and tune architecture of classifier and estimate error
For generalization of the model, validation set must be representative of
the input instances
Since test set is never used during training, it provides an unbiased
estimate of generalization error star/galaxy as class label

US Classification 12 / 56
Classification: Training-Validation split
Generally obtained by randomly splitting the dataset
e.g. 70—30, 80—20 random
Train-Validation split Use average performance of multiple random
(splits)
source: medium.com

US Classification 13 / 56
Classification: Cross-Validation

The dataset is randomly split into k folds

In each of k the i th fold is used for validation and the rest for training Every
instance is used once for validation and k — 1 times for training 5-fold, 10-
fold cross-validation
source: Scikit-learn

US Classification 14 / 56
Classification: Evaluation Metrics
Binary Classifiers (for classifying into two classes) are evaluated by
tabulating the classification results in a Confusion Matrix
Actual Classes Positive Negative

Positive
True False
Positive Positive

Negative
Predicted Classes
False True
Negative Negative

Some summary statistics of the confusion matrix are

TP + TN FP + FN
ACCURACY = ERROR =
TP + TN + FP + FN TP + TN + FP + FN

ACCURACY and ERROR are usually reported as percentages

US Classification 15 / 56
Classification: Evaluation Metrics
Actual Classes Positive
Negative

True False
Positive Positive
Negative Positive
Predicted Classes
False True
Negative Negative

With big imbalance in classes, ACCURACY and ERROR are misleading

In a tumors dataset 99% samples are negative (Blindly) predicting all
as negatives gives 99% accuracy But cancer is not detected

Have to use cost matrix/loss function (essentially weighted accuracy)

US Classification 16 / 56
Classification: Evaluation Metrics
TP
PRECISION =
+ FPTP
. sensitivity (measure of exactness)
TP
RECALL =
TP + FN
. specificity (measure of completeness)

F-measure: Maximizes both

2
F1 =
1 1
+
PRECISION RECALL

US Classification 17 / 56
Classification: O VERFIT T ING
Overfitting: The phenomenon when model performs very well on training
data but does not generalize to testing data
The model learns the data and not the underlying function.
Essentially learning by-rote
Model has too much freedom (many parameters with wider ranges)

Validation, Cross-validation, early stopping, regularization, model

comparison, Bayesian priors help avoiding over fitting
US Classification 18 / 56
Classification: O VERFIT T ING

US Classification 19 / 56
Classifier/Model

A classifier utilizes training data to understand how input variables

are related to the class variable
A model is built, which can be used to predict labels for unseen data
Kinds of Classifiers
Lazy Classifiers Eager Classifiers

US Classification 20 / 56
Kinds of Classifiers
Lazy Classifiers
Store the training data and wait for testing data
For an unseen test data record (data point), assign class label based on the
most related points in the training data
Less training time, more prediction time

Examples: k-Nearest Neighbor (kNN) Classifier

Eager Classifiers
Construct a classification model based on training data For a test data
point, use the model to assign class label More training time but less
prediction time

Examples: Naive Bayes, Decision Tree

US Classification 21 / 56
Nearest Neighbors Classification and Regression

US Classification 22 / 56
k-Nearest Neighbor (kNN) Classifier

k-NN is a simple method used for classification . also for regression

The class label of a test instance x is predicted to be the most common

class among the k nearest neighbors of x in the train set

Assign the test instance ( ? ) class A ( F ) or

class B (N)

k = 3 nearest neighbors (`2 distance)

1 F and 2 N = ) assigned label = N

k = 7 nearest neighbors (`2 distance)

4 F and 3 N = ) assigned label = F

US Classification 23 / 56
k-Nearest Neighbor (kNN)

The class label of a test instance x is predicted to be the most common

class among the k nearest neighbors of x in the train set

Assumes that the proximity measure captures class membership

Definition of proximity measure (defining ‘nearest’) is critical

The parameter k is important and sensitive to local structure of data

US Classification 24 / 56
k-Nearest Neighbor (kNN) Regression

In k-NN regression, for a test instance x the value of target variable y

is the ‘average’ of values of y of k-nearest neighbors of x in train set

The ‘average’ can be the weighted mean (weighted by similarity), in

this case generally take k so all points are included in neighborhood
P
sim(x, x 0) y(x 0)
x 0 2D
y(x) = P
sim(x, x 0)
x 0 2D

y(x) is the value of target variable y in instance x

US Classification 25 / 56
Naive Bayes Classifier

US Classification 26 / 56
Naive Bayes Classifier
Classify x = (x1,. .., xn) into one of K classes C1,. .., CK
Naive Bayes is a conditional probability model
For instance x it computes probabilities Pr[class = Cj |x] for each class Cj

Assumes that
1 All attributes are equally important
2 Attributes are statistically independent given the class label
knowing value of one attribute says nothing about value of another
Independence assumption is almost never correct (thus the word
Naive …….but works well in practice
Model is the probabilities calculated from training data for each attribute
with respect to class label

US Classification 27 / 56
Naive Bayes Classifier
Classify x = (x1,. .., xn) into one of K classes C1,. .., CK

We want to compute The Likelihood:Probability of Prior: Probability of class Cj

this. The Posterior predictor(s) given a class Cj . , without considering x.
probability of class Computed from frequencies of Estimated from frequency of
Cj given the object x predictor(s) in class Cj in train set labels Cj in train set

Substitute in numerator and ignore the denominator

P (Cj | x) = P (x1 |Cj ) ⇥ P (x2 | Cj ) ⇥ .. . ⇥ P (xn | Cj ) ⇥[ P (Cj ) ]

US Classification 28 / 56
Naive Bayes: Running Example
Train on records of weather conditions and whether or not game was played.
Given weather condition (test instance) predict whether game will be played

N. Milkic & U. Krcadinac @ Uni. of Belgrade

US Classification 30 / 56
Naive Bayes: Running Example

US Classification 31 / 56
Naive Bayes: Running Example

US Classification 32 / 56
Naive Bayes: Running Example

Given weather condition x = (sunny, cool, high, true) will game be played?

P (play = yes |x ) = P (sunny|yes) ⇥P (cool|yes) ⇥P (high|yes) ⇥P (true|yes) ⇥[P (yes)]

= 0.22 ⇥0.33 ⇥0.33 ⇥0.33 ⇥[0.64] = 0.0053

P (play = no |x ) = P (sunny|no) ⇥P (cool|no) ⇥P (high|no) ⇥P (true|no) ⇥[P (no)]

= 0.60 ⇥0.20 ⇥0.80 ⇥0.80 ⇥[0.36] = 0.0206

US Classification 33 / 56
Naive Bayes: Issues some issues
for Naive Bayes classifier: you are encouraged to read about

Zero frequency problem: probability = 0 for an attribute in a class

For example: P[Outlook = sunny|yes]=0
One zero would make whole product zero
Solution: Laplace smoothing (add-one smoothing)

Missing value of an attribute for a test instance

usually attribute is omitted from probability calculation

What if values of attributes are continuous?

Discretization solves the problem in many cases
Can also assume a probability distribution for each continuous
attribute and learn distribution parameters from training set

US Classification 34 / 56
DecisionTree Classifier

US Classification 35 / 56
Decision Tree

Fundamentally, an if-then rule set for classifying objects

Builds model in the form of a tree structure

US Classification 36 / 56
Decision Tree
Outlook

sunny overcast rainy

Zemel, Ustasun, Fidler @ Uni, of Toronto

Humidity Yes Windy

high normal true false

Temp Yes No Yes

high mild cool

Windy No No

true false

No Yes

Decision tree for binary classification of instance with nominal attributes Decision tree for binary classification of instance with numeric attributes

Each internal node tests an attribute xi

Branches correspond to possible (subsets of) values of xi
Each leaf node assigns a class label y

US Classification 37 / 56
Classification using Decision Trees
To classify a test instance x traverse the tree from root to leaf
Take branches at internal nodes according to results of their tests
Predict the class label at the leaf node reached

Zemel, Ustasun, Fidler @ Uni, of Toronto

US Classification 38 / 56
Classification using Decision Trees
To classify a test instance x traverse the tree from root to leaf
Take branches at internal nodes according to results of their tests
Predict the class label at the leaf node reached
Given weather condition x = (sunny, cool, high, true)
will game be played?
Outlook

sunny overcast rainy

Humidity Yes Windy

high normal true false

Temp Yes No Yes

high mild cool

Windy No No

true false

No Yes

US Classification 39 / 56
Building Decision Tree
Building the optimal decision tree is NP-H ARD problem

J. Leskovec @ Stanford

Recursively build the tree top-down, using greedy heuristics Start with
an empty decision tree
Split the current dataset by the best attribute until stopping condition

US Classification 40 / 56
Building Decision Tree
Suppose at some node G in the tree built so far
J. Leskovec @ Stanford

Shall we continue building the tree?

If Yes, G is internal, which attribute to split on (test)?
If No, G is leaf, what is the prediction rule?

US Classification 41 / 56
Building Decision Tree

Stop when the leaf (subtree at G)

J. Leskovec @ Stanford

is pure (purity?) or
When the size of sub-dataset at G is
small e.g. |DG | 5
.. .

US Classification 42 / 56
Building Decision Tree
If we stop at G, then prediction at G can be
J. Leskovec @ Stanford

mode of class labels in sub-dataset DG

For a numeric target variable
prediction could be an average of
target variable values in DG
When target variable is numeric it is
called Regression Tree

US Classification 43 / 56
Attribute Selection

US Classification 44 / 56
Building Decision Tree

Top attributes are selected based on metrics J. Leskovec @ Stanford

e.g.
Entropy
Information Gain
Gini Index

Common algorithms for Decision Tree are ID3, C4.5, ...

US Classification 45 / 56
Entropy
In information theory, entropy quantify the average level of information
content or uncertainty in a random variable
Flip a fair and a biased coin 16

Outcome of Coin 1 4

1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 ? 0 1
11
9
Outcome of Coin 2

1 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 ? 0 1

In which case would we be more surprised if the next outcome is a 1?

Entropy values range between 0 and 1 bit . unit of entropy

Max surprise is for fair coin (p = 1/2) over . no reason to expect an outcome
another
Min entropy value is 0 bit for p =0 or p =1
These slides about information theory concepts are adapted from Grosse, Farahmand, & Carrasquilla, Uni. of Toronto

US Classification 46 / 56
Entropy
A random variable X taking value x1,. .., xn has entropy

Xn
H(X ) = — p(xi ) logp(x i )
i =1

For fair coin, p = 1/2, H(·) = —1/2 log 1/2 —1/2 log 1/2 = 1
For 1-sided coin, p = 1/0, H(·) = —1log1—0log0 = 0

H(·) = —16/20 log 16/20 —4/20 log 4/20 =

0.721928
H(·) = —9/20 log 9/20 —11/20 log 11/20 = 0.99277
Flip a fair and a biased coin 16

Outcome of Coin 1 4

In which case would we be more surprised if the next outcome is a 1?

1
0
0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0
? 1

US Classification 47 / 56
Entropy
A random variable X taking value x1,. .., xn has entropy

Xn
H(X ) = — p(xi ) logp(x i )
i =1
source: Wikipedia

Entropy H ( X ) (expected surprisal) of a coin flip (in bits)

plotted versus the bias of the coin P r ( X = 1) = P (heads)

US Classification 48 / 56
Entropy of joint distribution
Entropy of the joint distribution of random variables X and Y

Grosse, Farahmand, & Carrasquilla, Uni. of Toronto

US Classification 49 / 56
Conditional Entropy

Grosse, Farahmand, & Carrasquilla, Uni. of Toronto

US Classification 50 / 56
Conditional Entropy

Grosse, Farahmand, & Carrasquilla, Uni. of Toronto

US Classification 51 / 56
Conditional Entropy

Grosse, Farahmand, & Carrasquilla, Uni. of Toronto

US Classification 52 / 56
Conditional Entropy

Grosse, Farahmand, & Carrasquilla, Uni. of Toronto

US Classification 53 / 56
Information Gain

Grosse, Farahmand, & Carrasquilla, Uni. of Toronto

US Classification 54 / 56
Information Gain

Grosse, Farahmand, & Carrasquilla, Uni. of Toronto

US Classification 55 / 56
Classification: Some other Concepts
Some other concepts related to classification you should be familiar with

Decision boundary ROC-Curve

Multi-Class classification form binary classifier

O NE-VS-A LL O NE-VS-R EST

Some classifiers you should read about, (at least wikipedia level is
essential for reading papers and using them in your projects)

Random Forest, Support Vector Machine, Neural Networks, Deep

Learning

US Classification 56 / 56

Discussion Experiment Moisture Content...
64% (11)
Discussion Experiment Moisture Content...
2 pages
Topic 1.2 Analytical Process
78% (9)
Topic 1.2 Analytical Process
52 pages
CE 200 Guide
100% (1)
CE 200 Guide
76 pages
Lec 5 B Analytics Classification
No ratings yet
Lec 5 B Analytics Classification
56 pages
ML 5
No ratings yet
ML 5
76 pages
Lecture 3 Basics of Clssification
No ratings yet
Lecture 3 Basics of Clssification
53 pages
Classification FoundationalMathofAI S24
No ratings yet
Classification FoundationalMathofAI S24
6 pages
ML Module4 Classification
No ratings yet
ML Module4 Classification
79 pages
Bayesian
No ratings yet
Bayesian
23 pages
UNIT - IV
No ratings yet
UNIT - IV
169 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Lecture7 KNN
No ratings yet
Lecture7 KNN
40 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
L05-Predictive Analytics I
No ratings yet
L05-Predictive Analytics I
49 pages
Classification and Regression: Arturo Calder On Mora
No ratings yet
Classification and Regression: Arturo Calder On Mora
8 pages
Classification and Clustering Algorithms
No ratings yet
Classification and Clustering Algorithms
108 pages
K Nearest Neighbors
No ratings yet
K Nearest Neighbors
19 pages
Sensitivity Unit 4
No ratings yet
Sensitivity Unit 4
4 pages
KNN Evaluation
No ratings yet
KNN Evaluation
51 pages
2 - Classification Models
No ratings yet
2 - Classification Models
52 pages
Total Listing Machine Learning
100% (1)
Total Listing Machine Learning
114 pages
4.0 Supervised Learning 4.1 Discuss Classification Model
No ratings yet
4.0 Supervised Learning 4.1 Discuss Classification Model
48 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
37 pages
Lecture-5 Classification in ML
No ratings yet
Lecture-5 Classification in ML
50 pages
CCPS521 WIN2023 Week05 - Classification
No ratings yet
CCPS521 WIN2023 Week05 - Classification
47 pages
Unit6 - 3 Classification-Bayesian
No ratings yet
Unit6 - 3 Classification-Bayesian
15 pages
ML Unit 4
No ratings yet
ML Unit 4
76 pages
Machine Learning Chapter3
No ratings yet
Machine Learning Chapter3
27 pages
T6 - KNN - Features, Distances &amp Amp Non-Parametric Models
No ratings yet
T6 - KNN - Features, Distances &amp Amp Non-Parametric Models
23 pages
ClassificationandPrediction Module3
No ratings yet
ClassificationandPrediction Module3
88 pages
2-KNN
No ratings yet
2-KNN
67 pages
ML Chapter 3
No ratings yet
ML Chapter 3
45 pages
ch-4 FML
No ratings yet
ch-4 FML
13 pages
Bayesian Classification
No ratings yet
Bayesian Classification
25 pages
Machine Learning - Classification
No ratings yet
Machine Learning - Classification
64 pages
AAI Lecture 11 SP 25
No ratings yet
AAI Lecture 11 SP 25
77 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
ML1 - Classification - KNN & NB
No ratings yet
ML1 - Classification - KNN & NB
23 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
Naive Bayes Classifier: K M M I I M
No ratings yet
Naive Bayes Classifier: K M M I I M
16 pages
Unit 5-6
No ratings yet
Unit 5-6
18 pages
Practical 3
No ratings yet
Practical 3
11 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
100 pages
NaiveBayersClassification BA
No ratings yet
NaiveBayersClassification BA
36 pages
Chapter
100% (1)
Chapter
101 pages
Introduction To Classification - PPT Slides 1
No ratings yet
Introduction To Classification - PPT Slides 1
62 pages
05 Classification
No ratings yet
05 Classification
33 pages
Datamining Lect12
No ratings yet
Datamining Lect12
75 pages
Cp4252 Machine Learning Lab Manual
No ratings yet
Cp4252 Machine Learning Lab Manual
40 pages
Lesson 8 - Classification
No ratings yet
Lesson 8 - Classification
74 pages
Unit-4 AML (1. Basics and K-NN)
No ratings yet
Unit-4 AML (1. Basics and K-NN)
25 pages
DWM - Classification-Unit7
No ratings yet
DWM - Classification-Unit7
44 pages
27 ShivangiSrivastava ML Lab
No ratings yet
27 ShivangiSrivastava ML Lab
52 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Classification
No ratings yet
Classification
58 pages
DW Unit 6-Min
No ratings yet
DW Unit 6-Min
44 pages
UNIT-5 DWM
No ratings yet
UNIT-5 DWM
73 pages
Statistic Inference Unit 2 Notes
No ratings yet
Statistic Inference Unit 2 Notes
34 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
39 pages
11 W11NSE6220 - Fall 2023 - Zeng
No ratings yet
11 W11NSE6220 - Fall 2023 - Zeng
43 pages
29-Naive Bayes-03-10-2024
No ratings yet
29-Naive Bayes-03-10-2024
48 pages
Boling Point Curves
No ratings yet
Boling Point Curves
28 pages
Cbcs Syllabus For Undergraduate Honours Course in Geography
No ratings yet
Cbcs Syllabus For Undergraduate Honours Course in Geography
48 pages
3151 H3 20240514
No ratings yet
3151 H3 20240514
1 page
Problems Chap 5 Forecasting
No ratings yet
Problems Chap 5 Forecasting
9 pages
Phenol
No ratings yet
Phenol
7 pages
Assignment 2 Outliers and Normality
No ratings yet
Assignment 2 Outliers and Normality
24 pages
C Programs
No ratings yet
C Programs
41 pages
Thompson Et Al-2014-Hydrological Processes PDF
No ratings yet
Thompson Et Al-2014-Hydrological Processes PDF
11 pages
Data Analytics Lab Manual - 250402 - 095326
No ratings yet
Data Analytics Lab Manual - 250402 - 095326
58 pages
Fundamental Concepts of Probability
No ratings yet
Fundamental Concepts of Probability
12 pages
Metode Kuadrat Terkecil
No ratings yet
Metode Kuadrat Terkecil
45 pages
CH 24 Multi Variate
No ratings yet
CH 24 Multi Variate
34 pages
Statistics
No ratings yet
Statistics
41 pages
Modeling S&P 500 STOCK INDEX Using ARMA-ASYMMETRIC POWER ARCH Models
No ratings yet
Modeling S&P 500 STOCK INDEX Using ARMA-ASYMMETRIC POWER ARCH Models
28 pages
Cupcake Height
No ratings yet
Cupcake Height
20 pages
A Conversation With Leslie Kish: Martin Frankel and Benjamin King
No ratings yet
A Conversation With Leslie Kish: Martin Frankel and Benjamin King
23 pages
Cdo
No ratings yet
Cdo
164 pages
Microtensile Bond Strength of Total-Etch and Self-Etching Adhesives To Caries-Affected Dentine
No ratings yet
Microtensile Bond Strength of Total-Etch and Self-Etching Adhesives To Caries-Affected Dentine
9 pages
Barrera District, Cabanatuan, Nueva Ecija Profile - PhilAtlas
No ratings yet
Barrera District, Cabanatuan, Nueva Ecija Profile - PhilAtlas
1 page
Pyq's 2023
No ratings yet
Pyq's 2023
22 pages
Voltinism of Odonata A Review
No ratings yet
Voltinism of Odonata A Review
45 pages
Atmospheric Environment: Sciencedirect
No ratings yet
Atmospheric Environment: Sciencedirect
13 pages
ARECA Thesis
No ratings yet
ARECA Thesis
32 pages
Requirements Concerning Gas Tankers: International Association of Classification Societies
No ratings yet
Requirements Concerning Gas Tankers: International Association of Classification Societies
31 pages
2810007
No ratings yet
2810007
2 pages
Chapter 11 Thermodynamics
100% (1)
Chapter 11 Thermodynamics
217 pages
Demand For Casting
No ratings yet
Demand For Casting
17 pages

Data Analytics Classification

Uploaded by

Data Analytics Classification

Uploaded by

Data Analytics

Entroy, Conidtional Entropy, Information Gain

Input: A collection of objects Features Class Label

source: Benchaji et.al. (2019)

User customers transactions and feedback history

Text is converted into feature vectors before classification

Sentiment Analysis Emotion Mining

Sky Survey Cataloging

Classify astronomical objects as stars or galaxies

Train-Validation Split and Cross-Validation

The model (classifier) is learned by finding patterns in training set

The dataset is randomly split into k folds

Some summary statistics of the confusion matrix are

ACCURACY and ERROR are usually reported as percentages

With big imbalance in classes, ACCURACY and ERROR are misleading

Have to use cost matrix/loss function (essentially weighted accuracy)

F-measure: Maximizes both

Validation, Cross-validation, early stopping, regularization, model

A classifier utilizes training data to understand how input variables

Examples: k-Nearest Neighbor (kNN) Classifier

Examples: Naive Bayes, Decision Tree

k-NN is a simple method used for classification . also for regression

The class label of a test instance x is predicted to be the most common

Assign the test instance ( ? ) class A ( F ) or

k = 3 nearest neighbors (`2 distance)

k = 7 nearest neighbors (`2 distance)

The class label of a test instance x is predicted to be the most common

Assumes that the proximity measure captures class membership

Definition of proximity measure (defining ‘nearest’) is critical

The parameter k is important and sensitive to local structure of data

In k-NN regression, for a test instance x the value of target variable y

The ‘average’ can be the weighted mean (weighted by similarity), in

y(x) is the value of target variable y in instance x

We want to compute The Likelihood:Probability of Prior: Probability of class Cj

Substitute in numerator and ignore the denominator

N. Milkic & U. Krcadinac @ Uni. of Belgrade

P (play = yes |x ) = P (sunny|yes) ⇥P (cool|yes) ⇥P (high|yes) ⇥P (true|yes) ⇥[P (yes)]

= 0.22 ⇥0.33 ⇥0.33 ⇥0.33 ⇥[0.64] = 0.0053

P (play = no |x ) = P (sunny|no) ⇥P (cool|no) ⇥P (high|no) ⇥P (true|no) ⇥[P (no)]

Zero frequency problem: probability = 0 for an attribute in a class

Missing value of an attribute for a test instance

What if values of attributes are continuous?

Fundamentally, an if-then rule set for classifying objects

sunny overcast rainy

Zemel, Ustasun, Fidler @ Uni, of Toronto

high normal true false

Temp Yes No Yes

high mild cool

Each internal node tests an attribute xi

Zemel, Ustasun, Fidler @ Uni, of Toronto

sunny overcast rainy

Humidity Yes Windy

high normal true false

Temp Yes No Yes

high mild cool

Shall we continue building the tree?

Stop when the leaf (subtree at G)

mode of class labels in sub-dataset DG

Top attributes are selected based on metrics J. Leskovec @ Stanford

Common algorithms for Decision Tree are ID3, C4.5, ...

In which case would we be more surprised if the next outcome is a 1?

Entropy values range between 0 and 1 bit . unit of entropy

H(·) = —16/20 log 16/20 —4/20 log 4/20 =

In which case would we be more surprised if the next outcome is a 1?

Entropy H ( X ) (expected surprisal) of a coin flip (in bits)

Grosse, Farahmand, & Carrasquilla, Uni. of Toronto

Grosse, Farahmand, & Carrasquilla, Uni. of Toronto

Grosse, Farahmand, & Carrasquilla, Uni. of Toronto

Grosse, Farahmand, & Carrasquilla, Uni. of Toronto

Grosse, Farahmand, & Carrasquilla, Uni. of Toronto

Grosse, Farahmand, & Carrasquilla, Uni. of Toronto

Grosse, Farahmand, & Carrasquilla, Uni. of Toronto

Decision boundary ROC-Curve

O NE-VS-A LL O NE-VS-R EST

Random Forest, Support Vector Machine, Neural Networks, Deep

You might also like