0% found this document useful (0 votes)

14 views49 pages

Lecture 4 Classification P1

Uploaded by

xuansangtruongthi2201

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views49 pages

Lecture 4 Classification P1

Uploaded by

xuansangtruongthi2201

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

UET

Since 2004

ĐẠI HỌC CÔNG NGHỆ, ĐHQGHN

VNU-University of Engineering and Technology

INT3405E - Machine Learning

Lecture 4: Classification (P1)

Hanoi, 09/2024
Recap: Key Issues in Machine Learning
● What are good hypothesis spaces? We choose
○ Which spaces have been useful in practical applications and why? To
● What algorithms can work with these spaces? Optimize
○ Are there general design principles for machine learning algorithms?
● How can we find the best hypothesis in an efficient way?
○ How to find the optimal solution efficiently (“optimization” question)
● How can we optimize accuracy on future data?
○ Known as the “overfitting” problem (i.e., “generalization” theory)
● How can we have confidence in the results?
○ How much training data is required to find accurate hypothesis? (“statistical” question)
● Are some learning problems computationally intractable? (“computational” question)
● How can we formulate application problems as machine learning problems? (“engineering”
question)
FIT-CS INT3405E - Machine Learning 2
Recap: Model Representation
Training Set How do we represent h ?

Learning Algorithm y

Size of h Estimated x
house price
x Hypothesis y
Linear regression with one variable.
“Univariate Linear Regression”

How to choose parameters ?

FIT-CS INT3405E - Machine Learning 3
Recap: Gradient Descent for Optimization

FIT-CS INT3405E - Machine Learning 4

Recap: Gradient Descent Example

(for fixed , this is a function of x) (function of the parameters )

How fast to converge to the Global Optimal?

FIT-CS INT3405E - Machine Learning 5
Normal Equation (3)
●Matrix-vector formulation

●Analytical solution
Take O(mn2+n3)

FIT-CS INT3405E - Machine Learning 6

Posterior Likelihood Prior

Thomas Bayes (1702–1761)

○ P(h) = prior probability of hypothesis h
○ P(D) = prior probability of training data D
○ P(h|D) = conditional probability of h given D (Posterior)
○ P(D|h) = conditional probability of D given h (Likelihood)

FIT-CS INT3405E - Machine Learning 8

Maximum A Posterior Learning (MAP)
●Maximum a posterior learning (MAP)
○Find the most probable hypothesis given the training data by
maximizing the posterior prob.

Prior encodes the

knowledge/preference
FIT-CS INT3405E - Machine Learning 9
MAP Learning
●For each hypothesis h in H, calculate the posterior prob.

●Output the hypothesis h with the highest posterior prob.

●Comments:
○ Computational intensive
○ Give a standard for judging the performance of learning algorithms
○ Choosing P(h) reflects our prior knowledge about the learning task

FIT-CS INT3405E - Machine Learning 10

Maximum-Likelihood Estimation (MLE)

●Maximum Likelihood Estimation (MLE) learning

○ Assume each hypothesis is equally probably a prior

○ Maximizing the likelihood of the training data

FIT-CS INT3405E - Machine Learning 11

Relationship between MLE Learning
and Least-Squared Error Learning (1)
● Consider

● Assume

● We want learn for f(x)

● Linear Regression minimizes the objective (cost function) of the Mean
Squared Error

FIT-CS INT3405E - Machine Learning 12

Relationship between MLE Learning
and Least-Squared Error Learning (2)

FIT-CS INT3405E - Machine Learning 13

Probabilistic Generative Models (1)
• Classify instance x into one of K classes

Density function for class Ck Class prior

FIT-CS INT3405E - Machine Learning 14

Probabilistic Generative Models (2)
• Classification decision

• The key is to decide the parameters

FIT-CS INT3405E - Machine Learning 15

Probabilistic Generative Models (3)
● Given training data
● We have closed-form solutions:

FIT-CS INT3405E - Machine Learning 16

Probabilistic Generative Models (4)

class-conditional posterior
densities probability

FIT-CS INT3405E - Machine Learning 17

Curse of Dimensionality
●One challenge of learning with high-dimensional data is insufficient data
samples
●Suppose 5 samples/objects is considered enough in 1-D
– 1D : 5 points
– 2D : 25 points
– 3D : 125 points
– 10D : 9 765 625 points

FIT-CS INT3405E - Machine Learning 18

Outline
● Bayesian Learning
○ Bayes Theorem
○ MAP learning vs. MLE learning
● Probabilistic Generative Models
○ Naïve Bayes Classifier
● Discriminative Models
○ Logistic Regression
○ Decision Tree
○ K-Nearest Neighbors
FIT-CS INT3405E - Machine Learning 19
Naïve Bayes Classifier (1)
• Hard to estimate for high dimensional data x
•Conditional Independence assumption
• All attributes are conditionally independent
•Naïve Bayes approximation Distribution of 1 D

FIT-CS INT3405E - Machine Learning 20

Naïve Bayes Classifier (2)
● Text categorization
: word histogram of a document
● Bag of words assumption:
○ Assume position doesn’t matter
● Conditional independence:

Occuring times
of word in
document x

FIT-CS INT3405E - Machine Learning 21

Parameter Estimation
●Learning by Maximum Likelihood Estimates
○ Simply count the frequencies in the data

○ Create a mega-document for topic k by concatenating all the docs in this topic
○ Compute frequency of w in the mega-document

FIT-CS INT3405E - Machine Learning 22

Problem with Maximum Likelihood
●What if there is a new word (e.g., any novel words created in internet) in a
test document which never appears in the training data

●Smoothing
○ Avoid zero prob.

FIT-CS INT3405E - Machine Learning 23

Naïve Bayes Classifier (3)
• Bad approximation Text categorization for 20 Newsgroups
• Good classification accuracy

FIT-CS INT3405E - Machine Learning 24

Naïve Bayes Classifier (4)

Naïve Bayes Classifier:

FIT-CS INT3405E - Machine Learning 25

Example: “Play Tennis” (1)
● Based on the examples in the table, classify the following datum x:
x=(Outl=Sunny, Temp=Cool, Hum=High, Wind=strong)

FIT-CS INT3405E - Machine Learning 26

The Independence Assumption
●Makes computation possible
●Yields optimal classifiers when satisfied
●Fairly good empirical results
●But is seldom satisfied in practice, as attributes (variables) are
often correlated
●Attempts to overcome this limitation:
○ Bayesian networks, that combine Bayesian reasoning with causal relationships
between attributes

FIT-CS INT3405E - Machine Learning 27

Decision Boundary of Naïve Bayes (1)
● Consider text categorization of two classes
● The ratio determines the decision

Linear decision boundary

FIT-CS INT3405E - Machine Learning 28
Decision Boundary of Naïve Bayes (2)
● Consider two class classification
● Gaussian density function
● Shared covariance matrix

Linear decision boundary

FIT-CS INT3405E - Machine Learning 29

Decision Boundary
• Generative models essentially create linear decision boundaries
• Why not directly model the linear decision boundary

SML– Term 1 2020-2021

FIT-CS INT3405E - Machine Learning 30
30
Outline
● Bayesian Learning
○ Bayes Theorem
○ MAP learning vs. MLE learning
● Probabilistic Generative Models
○ Naïve Bayes Classifier
● Discriminative Models
○ Logistic Regression
○ Decision Tree
○ K-Nearest Neighbors

FIT-CS INT3405E - Machine Learning 31

Discriminative Models: Logistic Regression
• Generative models often lead to linear decision boundary
• Linear discriminatory model
• Directly model the linear decision boundary

• w is the parameter to be decided

FIT-CS INT3405E - Machine Learning 32

Logistic Regression

FIT-CS INT3405E - Machine Learning 33

Logistic Sigmoid Function
● The logistic / sigmoid function

FIT-CS INT3405E - Machine Learning 34

Logistic Regression
• Given training data
• Likelihood function (or the Log-Likelihood)

• Learn parameter w by Maximum Likelihood Estimation (MLE)

FIT-CS INT3405E - Machine Learning 35

Convex Objective Functions

If y = 1 Convex Loss Functions: If y = -1

FIT-CS INT3405E - Machine Learning 36

Logistic Regression
• Convex objective function, global optimal

• No closed-form solution
• Gradient Descent

Classification error

FIT-CS INT3405E - Machine Learning 37

Example: Heart Disease (1)
1: 25-29
2: 30-34
3: 35-39
4: 40-44
5: 45-49
6: 50-54
7: 55-59
• Input feature x: age group id 8: 60-64

• Output y: if having heart disease

• y=1: having heart disease
• y=-1: no heart disease

FIT-CS INT3405E - Machine Learning 38

Example: Heart Disease (2)

FIT-CS INT3405E - Machine Learning 39

Example: Text Categorization (1)
●Learn to classify text into two categories
●Input d:
• a document, represented by a word histogram
●Output
• y=±1:
+1 for political document
-1 for non-political document

FIT-CS INT3405E - Machine Learning 40

Example: Text Categorization (2)
• Training data

FIT-CS INT3405E - Machine Learning 41

Example: Text Categorization (3)

• Dataset: Reuter-21578
• Classification accuracy
• Naïve Bayes: 77%
• Logistic regression: 88%

FIT-CS INT3405E - Machine Learning 42

Multi-class Logistic Regression
• How to extend logistic regression model to multi-class classification ?

FIT-CS INT3405E - Machine Learning 43

Conditional Exponential Model (1)
• Consider K classes
• Define

• where Z is normalization factor:

Normalization factor
(partition function)

• Need to learn

FIT-CS INT3405E - Machine Learning 44

Conditional Exponential Model (2)
• Learn weights w’s by maximum likelihood estimation

• Modified Conditional Exponential Model

FIT-CS INT3405E - Machine Learning 45

Logistic Regression versus Naïve Bayes
•Both are linear decision boundaries
• Naïve Bayes:

• Logistic regression: learn weights by MLE

•Both can be viewed as modeling p(x|y)

• Naïve Bayes: independence assumption
• Logistic regression: assume an exponential family distribution for
p(x|y) (a broad assumption)

FIT-CS INT3405E - Machine Learning 46

Discriminative versus Generative
Discriminative Models Generative Models

● Model P(y|x) directly • Model P(x|y) directly

Pros Pros
● Usually better performance
• Usually fast convergence
(with small training data)
● Robust to noise data • Cheap computation
Cons (easier to learn, e.g. NB)
● Slow convergence Cons
(e.g., LR by gradient descent)
• Sensitive to noise data
● Expensive computation
• Usually performs worse
(with small training data)
FIT-CS INT3405E - Machine Learning 47
Summary
● Bayesian Learning
○ Bayes Theorem
○ MAP learning vs. MLE learning
● Probabilistic Generative Models
○ Naïve Bayes Classifier
● Discriminative Models
○ Logistic Regression
○ Decision Tree
○ K-Nearest Neighbors

FIT-CS INT3405E - Machine Learning 48

UET
Since 2004

ĐẠI HỌC CÔNG NGHỆ, ĐHQGHN

VNU-University of Engineering and Technology

Thank you
Email me
[email protected]

Paul C. Shields The Ergodic Theory of Discrete Sample Paths Graduate Studies in Mathematics 13 1996
100% (1)
Paul C. Shields The Ergodic Theory of Discrete Sample Paths Graduate Studies in Mathematics 13 1996
259 pages
Lecture 4 Classification P1
No ratings yet
Lecture 4 Classification P1
50 pages
Lecture 4 Classification P1
No ratings yet
Lecture 4 Classification P1
51 pages
Lecture 7 - Feature Selection & Model Optimization
No ratings yet
Lecture 7 - Feature Selection & Model Optimization
48 pages
Lecture 5 Classification SVM
No ratings yet
Lecture 5 Classification SVM
44 pages
Lecture 6 Classification SVM
No ratings yet
Lecture 6 Classification SVM
44 pages
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-04 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-04 Reference-Material-I
69 pages
Week 3
No ratings yet
Week 3
56 pages
Lecture 6 Classification P3 SVM
No ratings yet
Lecture 6 Classification P3 SVM
44 pages
ML Module4 Classification
No ratings yet
ML Module4 Classification
79 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
Datamining Lect12
No ratings yet
Datamining Lect12
75 pages
Irs Unit 4 CH 1
No ratings yet
Irs Unit 4 CH 1
58 pages
MILIT PPT Modifies
No ratings yet
MILIT PPT Modifies
43 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
55 pages
Lecture03 Linear Regression
No ratings yet
Lecture03 Linear Regression
54 pages
Lecture 6 - Classification - SVM
No ratings yet
Lecture 6 - Classification - SVM
48 pages
6 Naive-Bayes
No ratings yet
6 Naive-Bayes
18 pages
Lecture 1 - Introduction To ML
No ratings yet
Lecture 1 - Introduction To ML
41 pages
Machine Learning Updated
No ratings yet
Machine Learning Updated
14 pages
07 Intro To ML
No ratings yet
07 Intro To ML
38 pages
Naivebayes Bayesnets Annotated
No ratings yet
Naivebayes Bayesnets Annotated
31 pages
For Unit 4 Useful
100% (1)
For Unit 4 Useful
107 pages
Learning
No ratings yet
Learning
51 pages
Bayesian
No ratings yet
Bayesian
23 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
Intelligent Systems
No ratings yet
Intelligent Systems
20 pages
Naïve Bayes Classifier: Dr. Hussain Dawood
No ratings yet
Naïve Bayes Classifier: Dr. Hussain Dawood
20 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
Classification
No ratings yet
Classification
4 pages
Maths For ML
No ratings yet
Maths For ML
156 pages
Matematics and Machine Learning
No ratings yet
Matematics and Machine Learning
156 pages
Machine Learning Notes
100% (3)
Machine Learning Notes
134 pages
Classification and Clustering Algorithms
No ratings yet
Classification and Clustering Algorithms
108 pages
Naïve Bayes Classifier: Ke Chen
No ratings yet
Naïve Bayes Classifier: Ke Chen
20 pages
Learning AI
No ratings yet
Learning AI
34 pages
Week 01
No ratings yet
Week 01
37 pages
Lec 04
No ratings yet
Lec 04
70 pages
CS3491-AI ML-Chapter 2
No ratings yet
CS3491-AI ML-Chapter 2
16 pages
Lecture W1c UG
No ratings yet
Lecture W1c UG
33 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
56 pages
Lecture Notes 2016
No ratings yet
Lecture Notes 2016
132 pages
Practicalintroductiontomachinelearning1561472049990 PDF
No ratings yet
Practicalintroductiontomachinelearning1561472049990 PDF
110 pages
KNN Evaluation
No ratings yet
KNN Evaluation
51 pages
Chapter
100% (1)
Chapter
101 pages
ML.4-Classification Techniques (Week 5,6,7)
No ratings yet
ML.4-Classification Techniques (Week 5,6,7)
56 pages
01 ML Basics
No ratings yet
01 ML Basics
61 pages
In5490 Classification
No ratings yet
In5490 Classification
85 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Unit 3
No ratings yet
Unit 3
100 pages
Lecture Notes
No ratings yet
Lecture Notes
86 pages
Data Mining Lecture 10B: Classification
No ratings yet
Data Mining Lecture 10B: Classification
62 pages
Module - 4 - ECE3047 - Machine Learning
No ratings yet
Module - 4 - ECE3047 - Machine Learning
81 pages
19 ML Intro
No ratings yet
19 ML Intro
33 pages
2021 Lecture10 BasicML
No ratings yet
2021 Lecture10 BasicML
76 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
Naive Bayes
No ratings yet
Naive Bayes
18 pages
Lecture13 Nbayes
No ratings yet
Lecture13 Nbayes
56 pages
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
From Everand
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
e3
No ratings yet
State Space Search: Fundamentals and Applications
From Everand
State Space Search: Fundamentals and Applications
Fouad Sabry
No ratings yet
Introduction To Probability and Statistics
No ratings yet
Introduction To Probability and Statistics
206 pages
Question Text: It Expands Available Data Enormously. Select One: A. Text Mining B. Sorting C. Volume D. Text
100% (1)
Question Text: It Expands Available Data Enormously. Select One: A. Text Mining B. Sorting C. Volume D. Text
25 pages
Assignment Booklet PGDAST 2016
No ratings yet
Assignment Booklet PGDAST 2016
29 pages
Detection and Estimation Theory (CT505) : Assignment 8 May 22, 2020
No ratings yet
Detection and Estimation Theory (CT505) : Assignment 8 May 22, 2020
2 pages
8.beta Distribution (First Kind)
No ratings yet
8.beta Distribution (First Kind)
31 pages
Classification Bayes
No ratings yet
Classification Bayes
18 pages
1 +i 2 1 2 I 2 2 2 I 2 2 +i 2: Assignment Iii Mathematics - Specpro I
No ratings yet
1 +i 2 1 2 I 2 2 2 I 2 2 +i 2: Assignment Iii Mathematics - Specpro I
2 pages
Chapter 13 Mathematics - Class 12 - Formula - Sheet
50% (2)
Chapter 13 Mathematics - Class 12 - Formula - Sheet
5 pages
jml5 1
No ratings yet
jml5 1
63 pages
TSDS Admission Exam Questions 2022 FINAL
No ratings yet
TSDS Admission Exam Questions 2022 FINAL
4 pages
B.S. (G) Final Year Math Ma Tics
No ratings yet
B.S. (G) Final Year Math Ma Tics
6 pages
Advanced Statistical Approaches To Quality: INSE 6220 - Week 4
No ratings yet
Advanced Statistical Approaches To Quality: INSE 6220 - Week 4
44 pages
Discrete Probability Distributions - Equations & Examples - Video & Lesson Transcript
No ratings yet
Discrete Probability Distributions - Equations & Examples - Video & Lesson Transcript
5 pages
Example 3 32: Calculations For Wire Flaws 1 Example 3 32: Calculations For Wire Flaws 2
No ratings yet
Example 3 32: Calculations For Wire Flaws 1 Example 3 32: Calculations For Wire Flaws 2
2 pages
Electronic Devices and Circuits
No ratings yet
Electronic Devices and Circuits
6 pages
Chapter Four 4. Joint and Marginal Distributions
100% (1)
Chapter Four 4. Joint and Marginal Distributions
12 pages
Quantifying Transmission Reliability Margin: J. Zhang, I. Dobson, F.L. Alvarado
No ratings yet
Quantifying Transmission Reliability Margin: J. Zhang, I. Dobson, F.L. Alvarado
6 pages
Comprehensive - MCQ - Questions For Probability
No ratings yet
Comprehensive - MCQ - Questions For Probability
5 pages
Arch Garch Models Stat
No ratings yet
Arch Garch Models Stat
4 pages
Prob. Distribution
No ratings yet
Prob. Distribution
49 pages
LK 1 Statistika
No ratings yet
LK 1 Statistika
3 pages
The Book of Statistical Proofs: DOI: 10.5281/zenodo.4305950 2022-10-22, 07:22
No ratings yet
The Book of Statistical Proofs: DOI: 10.5281/zenodo.4305950 2022-10-22, 07:22
575 pages
Discrete Random Variables: He Shuangchi
No ratings yet
Discrete Random Variables: He Shuangchi
47 pages
Mixture Models: Sargur Srihari Srihari@cedar - Buffalo.edu
No ratings yet
Mixture Models: Sargur Srihari Srihari@cedar - Buffalo.edu
7 pages
UNit 2-Prob Dist-Discrete
No ratings yet
UNit 2-Prob Dist-Discrete
5 pages
2021 Rosalsky, Thanh A Note On The Stochastic Domination Condition and Uniform Integrability With Applications To The Strong Law
No ratings yet
2021 Rosalsky, Thanh A Note On The Stochastic Domination Condition and Uniform Integrability With Applications To The Strong Law
10 pages
STATISTICS
No ratings yet
STATISTICS
7 pages

Lecture 4 Classification P1

Uploaded by

Lecture 4 Classification P1

Uploaded by

UET

ĐẠI HỌC CÔNG NGHỆ, ĐHQGHN

INT3405E - Machine Learning

How to choose parameters ?

FIT-CS INT3405E - Machine Learning 4

(for fixed , this is a function of x) (function of the parameters )

How fast to converge to the Global Optimal?

FIT-CS INT3405E - Machine Learning 6

Posterior Likelihood Prior

Thomas Bayes (1702–1761)

FIT-CS INT3405E - Machine Learning 8

Prior encodes the

●Output the hypothesis h with the highest posterior prob.

FIT-CS INT3405E - Machine Learning 10

●Maximum Likelihood Estimation (MLE) learning

○ Maximizing the likelihood of the training data

FIT-CS INT3405E - Machine Learning 11

● We want learn for f(x)

FIT-CS INT3405E - Machine Learning 12

FIT-CS INT3405E - Machine Learning 13

Density function for class Ck Class prior

FIT-CS INT3405E - Machine Learning 14

• The key is to decide the parameters

FIT-CS INT3405E - Machine Learning 15

FIT-CS INT3405E - Machine Learning 16

FIT-CS INT3405E - Machine Learning 17

FIT-CS INT3405E - Machine Learning 18

FIT-CS INT3405E - Machine Learning 20

FIT-CS INT3405E - Machine Learning 21

FIT-CS INT3405E - Machine Learning 22

FIT-CS INT3405E - Machine Learning 23

FIT-CS INT3405E - Machine Learning 24

Naïve Bayes Classifier:

FIT-CS INT3405E - Machine Learning 25

FIT-CS INT3405E - Machine Learning 26

FIT-CS INT3405E - Machine Learning 27

Linear decision boundary

Linear decision boundary

FIT-CS INT3405E - Machine Learning 29

SML– Term 1 2020-2021

FIT-CS INT3405E - Machine Learning 31

• w is the parameter to be decided

FIT-CS INT3405E - Machine Learning 32

FIT-CS INT3405E - Machine Learning 33

FIT-CS INT3405E - Machine Learning 34

• Learn parameter w by Maximum Likelihood Estimation (MLE)

FIT-CS INT3405E - Machine Learning 35

If y = 1 Convex Loss Functions: If y = -1

FIT-CS INT3405E - Machine Learning 36

FIT-CS INT3405E - Machine Learning 37

• Output y: if having heart disease

FIT-CS INT3405E - Machine Learning 38

FIT-CS INT3405E - Machine Learning 39

FIT-CS INT3405E - Machine Learning 40

FIT-CS INT3405E - Machine Learning 41

FIT-CS INT3405E - Machine Learning 42

FIT-CS INT3405E - Machine Learning 43

• where Z is normalization factor:

FIT-CS INT3405E - Machine Learning 44

• Modified Conditional Exponential Model

FIT-CS INT3405E - Machine Learning 45

• Logistic regression: learn weights by MLE

•Both can be viewed as modeling p(x|y)

FIT-CS INT3405E - Machine Learning 46

● Model P(y|x) directly • Model P(x|y) directly

FIT-CS INT3405E - Machine Learning 48

ĐẠI HỌC CÔNG NGHỆ, ĐHQGHN

You might also like