Machine
Machine
Machine Learning
1
25-08-2023
What is Learning?
• Herbert Simon: “Learning is any process
by which a system improves performance
from experience.”
2
25-08-2023
Learning
• Learning is essential for unknown environments,
– i.e., when designer lacks omniscience
3
25-08-2023
Machine Learning
4
25-08-2023
5
25-08-2023
6
25-08-2023
7
Slide from Mackassy
7
25-08-2023
8
Slide from Mackassy
8
25-08-2023
Classification Examples
• In classification, we predict labels y (classes) for inputs x
• Examples:
– OCR (input: images, classes: characters)
– Medical diagnosis (input: symptoms, classes: diseases)
– Automatic essay grader (input: document, classes: grades)
– Fraud detection (input: account activity, classes: fraud / no fraud)
– Customer service email routing
– Recommended articles in a newspaper, recommended books
– DNA and protein sequence identification
– Categorization and identification of astronomical images
– Financial investments
– … many more
9
25-08-2023
Inductive learning
• Simplest form: learn a function from examples
•
• f is the target function
• An example is a pair (x, f(x))
•
10
25-08-2023
11
11
25-08-2023
12
12
25-08-2023
13
13
25-08-2023
14
14
25-08-2023
15
15
25-08-2023
16
25-08-2023
Generalization
• Hypotheses must generalize to correctly
classify instances not in the training data.
• Simply memorizing training examples is a
consistent hypothesis that does not
generalize.
• Occam’s razor:
– Finding a simple hypothesis helps ensure
generalization.
17
17
17
25-08-2023
18
18
25-08-2023
Supervised Learning
• Learning a discrete function: Classification
– Boolean classification:
• Each example is classified as true(positive) or
false(negative).
• Learning a continuous function: Regression
19
19
25-08-2023
20
Data Mining: Concepts and Techniques 20
20
25-08-2023
21
21
25-08-2023
• Data cleaning
– Preprocess data in order to reduce noise and
handle missing values
• Relevance analysis (feature selection)
– Remove the irrelevant or redundant attributes
• Data transformation
– Generalize data to (higher concepts, discretization)
– Normalize attribute values
22
Data Mining: Concepts and Techniques
22
25-08-2023
Classification Techniques
• Decision Tree based Methods
• Rule-based Methods
• Naïve Bayes and Bayesian Belief
Networks
• Neural Networks
• Support Vector Machines
• and more...
23
23
25-08-2023
24
24
25-08-2023
Feature(Attribute)-based representations
• Examples described by feature(attribute) values
– (Boolean, discrete, continuous)
• E.g., situations where I will/won't wait for a table:
25
25-08-2023
Decision trees
• One possible representation for hypotheses
• E.g., here is the “true” tree for deciding whether to wait:
26
26
25-08-2023
Expressiveness
• Decision trees can express any function of the input attributes.
• E.g., for Boolean functions, truth table row → path to leaf:
• Trivially, there is a consistent decision tree for any training set with
one path to leaf for each example (unless f nondeterministic in x) but
it probably won't generalize to new examples
27
27
25-08-2023
28
28
25-08-2023
29
29
25-08-2023
This
follows an
example
of
Quinlan’s
ID3
(Playing
Tennis)
30
August 25, 2023 Data Mining: Concepts and Techniques 30
30
25-08-2023
Example
31
31
25-08-2023
Example
32
32
25-08-2023
Example
33
33
25-08-2023
Example
34
34
25-08-2023
Example
35
35
25-08-2023
Example
36
36
25-08-2023
Example
37
37
25-08-2023
Tree Induction
• Greedy strategy.
– Split the records based on an attribute test
that optimizes certain criterion.
• Issues
– Determine how to split the records
• How to specify the attribute test condition?
• How to determine the best split?
– Determine when to stop splitting
38
38
25-08-2023
Choosing an attribute
• Idea: a good attribute splits the examples into subsets
that are (ideally) "all positive" or "all negative"
39
39
25-08-2023
Non-homogeneous, Homogeneous,
High degree of impurity Low degree of impurity
40
40
25-08-2023
• Gini Index
• Misclassification error
41
41
25-08-2023
42
August 25, 2023 Data Mining: Concepts and Techniques 42
42
25-08-2023
Information gain
For the training set, p = n = 6, I(6/12, 6/12) = 1 bit
Patrons has the highest IG of all attributes and so is chosen by the DTL
algorithm as the root
43
43
25-08-2023
Example contd.
• Decision tree learned from the 12 examples:
44
44
25-08-2023
45
45
25-08-2023
46
46
25-08-2023
47
25-08-2023
48
25-08-2023
49
49
25-08-2023
• Disadvantages
– Computationally expensive to train
– Some decision trees can be overly complex that
do not generalise the data well.
– Less expressivity: There may be concepts that are
hard to learn with limited decision trees
50
50
25-08-2023
51
August 25, 2023 Data Mining: Concepts and Techniques 51
51
25-08-2023
Rule-Based Classifier
• Classify records by using a collection of
“if…then…” rules
• Rule: (Condition) y
– where
• Condition is a conjunctions of attributes
• y is the class label
– LHS: rule antecedent or condition
– RHS: rule consequent
– Examples of classification rules:
• (Blood Type=Warm) (Lay Eggs=Yes) Birds
• (Taxable Income < 50K) (Refund=Yes) Evade=No52
52
25-08-2023
53
25-08-2023
conjunction: the leaf holds the class prediction no yes excellent fair
no yes yes
54
August 25, 2023 Data Mining: Concepts and Techniques 54
54
25-08-2023
Extra Slides
55
55
25-08-2023
Learning agents
56
56
25-08-2023
Classification(Sınıflandırma)
57
57
25-08-2023
Hypothesis spaces
How many distinct decision trees with n Boolean attributes?
= number of Boolean functions
n
= number of distinct truth tables with 2n rows = 22
58
58
25-08-2023
59
59
25-08-2023
Information gain
• A chosen attribute A divides the training set E into
subsets E1, … , Ev according to their values for A, where
A has v distinct values.
60
60
25-08-2023
Performance measurement
• How do we know that h ≈ f ?
1. Use theorems of computational/statistical learning theory
2. Try h on a new test set of examples
(use same distribution over example space as training set)
Learning curve = % correct on test set as a function of training set size
61
61