Introduction
Introduction
MACHINE LEARNING
MACHINE LEARNING
Machine learning is the field of study that gives
computers the ability to learn without being
explicitly programmed.
MACHINE LEARNING
Machine learning is the field of study that gives
computers the ability to learn without being
explicitly programmed.
A computer program is said to learn from
experience E with respect to some task T and
some performance measure P, if its performance
on T, as measured by P, improves with
experience E.
MACHINE LEARNING
Example: Spam filter- given examples of spam e-
mails and examples of ham e-mails, learns to flag
spam.
Training set- examples that the system uses to learn.
T (task)- flag spam for new e-mails
E (experience)- training data
P (performance)- ? needs to be defined:
Ex: the ratio of correctly classified emails accuracy
EVALUATING PERFORMANCE ON A TASK
Machine learning problems don’t have a
“correct” answer.
Consider sorting problem:
Many sorting algorithms available: bubble sort, quick
sort, insertion sort ...
The performance is measured in terms of how fast
they are and how much data they can handle.
Would we compare the sorting algorithms with
respect to the correctness of the result?
EVALUATING PERFORMANCE ON A TASK
Machine learning problems don’t have a
“correct” answer.
Consider sorting problem:
Many sorting algorithms available: bubble sort, quick
sort, insertion sort ...
The performance is measured in terms of how fast
they are and how much data they can handle.
Would we compare the sorting algorithms with
respect to the correctness of the result?
Algorithm that isn’t guaranteed to produce a sorted list
every time is useless as a sorting algorithm.
EVALUATING PERFORMANCE ON A TASK
No perfect solution in machine learning
Perfect e-mail spam filter does not exist!!!
In many cases data is “noisy”
Examples mislabelled
Features contain errors
o Performance evaluation of learning algorithms is
important in machine learning.
WHY USE MACHINE LEARNING?
WHY USE MACHINE LEARNING?
Let’s write a spam filter using traditional
programming technique
1) Study spam emails and get the patterns and most
occurring words.
2) Write detection algorithm.
3) Test and repeat steps 1 and 2 until it is good
enough
WHY USE MACHINE LEARNING?
Launch!
Analyze
errors
WHY USE MACHINE LEARNING?
Launch!
Data
Analyze
errors
WHY USE MACHINE LEARNING?
Consider the example of recognizing handwritten
digits.
Document Classification
Expectation Maximization
M order of polynomial
coefficients
CURVE FITTING
TESTING AND VALIDATING
Once you have a trained model, evaluate it and
fine-tune it.
Split your data into two sets: training set and the
test set.
Generalization error: error rate on the new cases,
estimated by evaluating the model on test set.
If the training error is low (makes few mistakes
on training set) but the generalization error is
high, then the model is overfitting the training
set.
HOW ML HELPS TO SOLVE A TASK?