Unit1 2
Unit1 2
• Seeds = Algorithms
• Nutrients = Data
• Gardener = You
• Plants = Programs
Sample Applications
• Web search
• Computational biology
• Finance
• E-commerce
• Space exploration
• Robotics
• Information extraction
• Social networks
• Debugging
Types of Learning
• Supervised (inductive) learning
– Training data includes desired outputs
• Unsupervised learning
– Training data does not include desired outputs
• Semi-supervised learning
– Training data includes a few desired outputs
• Reinforcement learning
Inductive Learning
• Given examples of a function (X, F(X))
• Predict function F(X) for new examples X
– Discrete F(X): Classification
– Continuous F(X): Regression
– F(X) = Probability(X): Probability estimation
Learning system model
System
Training
Training and testing
Data Practical
acquisition usage
Universal set
(unobserved)
Supervised Unsupervised
learning learning
Semi-supervised learning 20
Machine learning structure
• Supervised learning
Machine learning structure
• Unsupervised learning
Learning techniques
• Supervised learning categories and techniques
– Linear classifier (numerical functions)
– Parametric (Probabilistic functions)
• Naïve Bayes, Gaussian discriminant analysis (GDA), Hidden
Markov models (HMM), Probabilistic graphical models
– Non-parametric (Instance-based functions)
• K-nearest neighbors, Kernel regression, Kernel density
estimation, Local regression
– Non-metric (Symbolic functions)
• Classification and regression tree (CART), decision tree
– Aggregation
Learning techniques
• Linear classifier
•Techniques:
– Perceptron
– Logistic regression
– Support vector machine (SVM)
– Ada-line
Learning techniques
Using perceptron learning algorithm(PLA)
Training Testing
Error rate: Error rate:
0.10 0.156
Learning techniques
Using logistic regression
Training Testing
Error rate: Error rate:
0.11 0.145
Learning techniques
• Non-linear case
35
Face Recognition
Training examples of a person
Test images
α1
38
Supervised Learning: Uses
Example: decision trees tools that create rules
• Prediction of future cases: Use the rule to
predict the output for future inputs
• Knowledge extraction: The rule is easy to
understand
• Compression: The rule is simpler than the data it
explains
• Outlier detection: Exceptions that are not
39
covered by the rule, e.g., fraud
Unsupervised Learning
• Learning “what normally happens”
• No output
• Clustering: Grouping similar instances
• Other applications: Summarization, Association
Analysis
• Example applications
– Customer segmentation in 40
CRM
Reinforcement Learning
• Topics:
– Policies: what actions should an agent take in a
particular situation
– Utility estimation: how good is a state (🡪used by
policy)
• No supervised output but delayed reward
• Credit assignment problem (what was responsible for the
outcome)
• Applications:
– Game playing 41
Learning
An example application
• An emergency room in a hospital
measures 17 variables (e.g., blood
pressure, age, etc) of newly admitted
patients.
• A decision is needed: whether to put a
new patient in an intensive-care unit.
• Due to the high cost of ICU, those
patients who may survive less than a
month are42given higher priority.
Learning
An example application
• An emergency room in a hospital
measures 17 variables (e.g., blood
pressure, age, etc) of newly admitted
patients.
• A decision is needed: whether to put a
new patient in an intensive-care unit.
• Due to the high cost of ICU, those
patients who may survive less than a
month are43given higher priority.
Another application
• A credit card company receives thousands
of applications for new cards. Each
application contains information about an
applicant,
– age
– Marital status
– annual salary
– outstanding debts
– credit rating
– etc.
• Problem: to 44decide whether an application
Machine learning and our focus
• Like human learning from past experiences.
• A computer does not have “experiences”.
• A computer system learns from data, which
represent some “past experiences” of an
application domain.
• Our focus: learn a target function that can be
used to predict the values of a discrete class
attribute, e.g., approve or not-approved, and
high-risk or low risk.
45
The data and the goal
• Data: A set of data records (also called
examples, instances or cases) described
by
– k attributes: A1, A2, … Ak.
– a class: Each example is labelled with a pre-
defined class.
• Goal: To learn a classification model from
the data that can be used to predict the
46
An example: data (loan application)
Approved or not
47
An example: the learning task
• Learn a classification model from the data
• Use the model to classify future loan
applications into
– Yes (approved) and
– No (not approved)
• What is the class for following
case/instance?
48
Supervised learning process:
two steps
Learning (training): Learn a model using the training data
Testing: Test the model using unseen test data to assess the model accuracy
49
What do we mean by learning?
• Given
– a data set D,
– a task T, and
– a performance measure M,
a computer system is said to learn from D
to perform the task T if after learning the
system’s performance on T improves as
measured by M.
• In other words,
50 the learned model helps
An example
• Data: Loan application data
• Task: Predict whether a loan should be
approved or not.
• Performance measure: accuracy.
• Efficiency
– time to construct the model
– time to use the model
• Robustness: handling noise and missing values
• Scalability: efficiency in disk-resident databases
• Interpretability:
– understandable and insight provided by the model
• Compactness of the model:53 size of the tree, or the
Evaluation methods
• Holdout set: The available data set D is divided into two disjoint
subsets,
– the training set Dtrain (for learning a model)
– the test set Dtest (for testing the model)
• Important: training set should not be used in testing and the test set
should not be used in learning.
– Unseen test set provides a unbiased estimate of accuracy.
• The test set is also called the holdout set. (the examples in the
original data set D are all labeled with classes.)
• This method is mainly used when the data set D is large.
54
Evaluation methods (cont…)
• n-fold cross-validation: The available data is partitioned
into n equal-size disjoint subsets.
• Use each subset as the test set and combine the rest n-1
subsets as the training set to learn a classifier.
• The procedure is run n times, which give n accuracies.
• The final estimated accuracy of learning is the average of
the n accuracies.
• 10-fold and 5-fold cross-validations are commonly used.
• This method is used when the available data is not large.
55
Evaluation methods (cont…)
59
Precision and recall measures
(cont…)
• Statlib: http://lib.stat.cmu.edu/
• Delve: http://www.cs.utoronto.ca/~delve/
63
Resources: Journals
• Journal of Machine Learning Research
www.jmlr.org
• Machine Learning
• IEEE Transactions on Neural Networks
• IEEE Transactions on Pattern Analysis
and Machine Intelligence
• Annals of Statistics
• Journal of the American
64 Statistical
Resources: Conferences
• International Conference on Machine Learning (ICML)
• European Conference on Machine Learning (ECML)
• Neural Information Processing Systems (NIPS)
• Computational Learning
• International Joint Conference on Artificial Intelligence (IJCAI)
• ACM SIGKDD Conference on Knowledge Discovery and Data Mining
(KDD)
• IEEE Int. Conf. on Data Mining (ICDM)
65
References and
acknowledgemnent
• Srihari, S.N., Covindaraju, Pattern recognition, Chapman &Hall, London,
1034-1041, 1993,
• Sergios Theodoridis, Konstantinos Koutroumbas , pattern recognition ,
Pattern Recognition ,Elsevier(USA)) ,1982
• R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, New York:
John Wiley, 2001
• W.L.Chao, J.J.Ding, “Integrated Machine Learning Algorithms for Human
Age Estimation”, NTU, 2011.
• Semi-supervised Learning, Avrim Blum.
An Example
• Suppose that:
– A fish packing plant
wants to automate the
process of sorting
incoming fish on a
conveyor belt
according to species,
– There are two
species:
An Example
• How to distinguish one specie from
the other ? (length, width, weight,
number and shape of fins, tail
shape,etc.)
An Example
• (1, 2) , (2, 6), (3, 12), (4, 20) ……. (10, 110)
Y = F(x)
Training and testing
Universal set
(unobserved)