ML Lecture1 Handouts
ML Lecture1 Handouts
• Goal (Lectures): To present basic theoretical concepts and key algorithms that
form the core of machine learning
• Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis)
• Lecture 13-14: : Instance Based Learning & Genetic Algorithms (M. Pantic)
NOTE
CBC accounts for 33.3% of the final grade for the Machine Learning Exam.
2 1
final _ grade = exam _ grade + exam _ grade
3 3
!
• Lecture 1-2: Concept Learning
• Find-S algorithm
• Candidate-Elimination algorithm
• Learning ↔ Intelligence
(Def: Intelligence is the ability to learn and use concepts to solve problems.)
• Learning ↔ Intelligence
(Def: Intelligence is the ability to learn and use concepts to solve problems.)
Determine
• Ideal Target Function is usually not known; machine learning
algorithms learn an approximation of V, say V’
Target Function
• Representation of function V’ to be learned should
Choose Target F-on
– be as close an approximation of V as possible
Representation
– require (reasonably) small amount of training data to be learned
Choose Learning
Algorithm • V’(d) = w0 + w1x1 +…+ wnxn where ‹x1…xn› ≡ d ∈ D is an input state.
This reduces the problem to learning (the most optimal) weights w.
• Concept learning
– supervised, eager learning
– target problem: whether something belongs to the target concept or not
– target function: V: D → {true, false}
• Aim: Find a hypothesis h∈ H such that (∀d ∈ D) h(d) – c(d) < ε ≈ 0, where H is the
set of all possible hypotheses h ≡ ‹a1, a2, a3, a4, a5, a6›, where each ak, k = [1..6], may
be ‘?’ (≡ any value is acceptable), ‘0’ (≡ no value is acceptable), or a specific value.
h ≡ ‹?, ?, ?, ?, ?, ?› h ≡ ‹0, 0, 0, 0, 0, 0› h ≡ ‹?, ?, yes, ?, ?, ?›
Maja Pantic Machine Learning (course 395)
Concept Learning as Search
• Aim: Find a hypothesis h∈ H such that (∀d ∈ D) h(d) – c(d) < ε ≈ 0, where H is the
set of all possible hypotheses h ≡ ‹a1, a2, a3, a4, a5, a6›, where each ak, k = [1..6], may
be ‘?’ (≡ any value is acceptable), ‘0’ (≡ no value is acceptable), or a specific value.
concept learning ≡ searching through H
• General-to-Specific Ordering:
– h1 precedes (is more general than) h2 ⇔ (∀d ∈ D) (h1(d) = 1) ← (h2(d) = 1)
(e.g., h1 ≡ ‹?, ?, yes,?, ?, ?› and h2 ≡ ‹?, ?, yes,?, ?, yes› ⇒ h1 >g h2 )
– h1 and h2 are of equal generality ⇔ (∃d ∈ D) { [(h1(d) = 1) → (h2(d) = 1)] ∧
[(h2(d) = 1) → (h1(d) = 1)] ∧ h1 and h2 have equal number of ‘?’ }
(e.g., h1 ≡ ‹?, ?, yes,?, ?, ?› and h2 ≡ ‹?, ?, ?, ?, ?, yes› ⇒ h1 =g h2 )
– h2 succeeds (is more specific than) h1 ⇔ (∀d ∈ D) (h1(d) = 1) ← (h2(d) = 1)
(e.g., h1 ≡ ‹?, ?, yes,?, ?, ?› and h2 ≡ ‹?, ?, yes,?, ?, yes› ⇒ h2 ≥g h1 )
• Find-S is guaranteed to output the most specific hypothesis h that best fits positive
training examples.
• The hypothesis h returned by Find-S will also fit negative examples as long as
training examples are correct.
• However,
– Find-S is sensitive to noise that is (almost always) present in training examples.
– there is no guarantee that h returned by Find-S is the only h that fits the data.
– several maximally specific hypotheses may exist that fits the data but, Find-S
will output only one.
– Why we should prefer most specific hypotheses over, e.g., most general
hypotheses?
• Find-S is guaranteed to output the most specific hypothesis h that best fits positive
training examples.
• The hypothesis h returned by Find-S will also fit negative examples as long as
training examples are correct.
• However,
1. Find-S is sensitive to noise that is (almost always) present in training examples.
2. there is no guarantee that h returned by Find-S is the only h that fits the data.
3. several maximally specific hypotheses may exist that fits the data but, Find-S
will output only one.
4. Why we should prefer most specific hypotheses over, e.g., most general
hypotheses?
• Main idea: Output a set of hypothesis VS ⊆ H that fit (are consistent) with data D
d1 is positive → refine S
d2 is negative → refine G
d3 is positive → refine S
two g∈ G2 are inconsistent with d3, i.e., ‹?, ?, ?, arrogant, ?, ?› and ‹?, ?, ?, ?, toothy, ?› →
G3 ← {‹blond, ?, ?, ?, ?, ?› , ‹?, ?, yes, ?, ?, ?› , ‹?, ?, ?, ?, ?, no› }
d4 is negative → refine G
d5 is negative → refine G
Output of C-E:
version space of hypotheses VS ⊆ H bound with
specific boundary S ≡ {‹blond, ?, yes, ?, ?, no›} and
general boundary G ≡ {‹?, ?, yes, ?, ?, ?› }
Output of Find-S:
most specific hypothesis h ≡ ‹blond, ?, yes, ?, ?, no›
Output of C-E:
version space of hypotheses VS ⊆ H bound with
specific boundary S ≡ {‹blond, ?, yes, ?, ?, no›} and
general boundary G ≡ {‹?, ?, yes, ?, ?, ?› }
• Find-S algorithm
• Candidate-Elimination algorithm
!
• Lecture 3-4: Decision Trees & CBC Intro