Lecture 4 Classification P1
Lecture 4 Classification P1
Since 2004
Hanoi, 09/2024
Recap: Key Issues in Machine Learning
● What are good hypothesis spaces? We choose
○ Which spaces have been useful in practical applications and why? To
● What algorithms can work with these spaces? Optimize
○ Are there general design principles for machine learning algorithms?
● How can we find the best hypothesis in an efficient way?
○ How to find the optimal solution efficiently (“optimization” question)
● How can we optimize accuracy on future data?
○ Known as the “overfitting” problem (i.e., “generalization” theory)
● How can we have confidence in the results?
○ How much training data is required to find accurate hypothesis? (“statistical” question)
● Are some learning problems computationally intractable? (“computational” question)
● How can we formulate application problems as machine learning problems? (“engineering”
question)
FIT-CS INT3405E - Machine Learning 2
Recap: Model Representation
Training Set How do we represent h ?
Learning Algorithm y
Size of h Estimated x
house price
x Hypothesis y
Linear regression with one variable.
“Univariate Linear Regression”
●Analytical solution
Take O(mn2+n3)
●Comments:
○ Computational intensive
○ Give a standard for judging the performance of learning algorithms
○ Choosing P(h) reflects our prior knowledge about the learning task
● Assume
class-conditional posterior
densities probability
Occuring times
of word in
document x
○ Create a mega-document for topic k by concatenating all the docs in this topic
○ Compute frequency of w in the mega-document
●Smoothing
○ Avoid zero prob.
• No closed-form solution
• Gradient Descent
Classification error
• Dataset: Reuter-21578
• Classification accuracy
• Naïve Bayes: 77%
• Logistic regression: 88%
Normalization factor
(partition function)
• Need to learn
Thank you
Email me
[email protected]