Machine Learning Algorithms - pptx-1
Machine Learning Algorithms - pptx-1
2
Table of Contents
K-Nearest Neighbors
Linear Classifier
Loss Functions
3
Table of Contents
4
Machine Learning Algorithms – an Overview
5
Machine Learning Algorithms – an Overview
• KNNs
• Decision Trees
• Random Forest
Etc.
6
Problem of hard coded classification
7
Edges based method?
8
A possible Approach
9
A possible Approach
Memorize all
data and labels
10
Example Dataset: CIFAR10
10 classes
50,000 training images
10,000 testing images
Alex Krizhevsky, “Learning Multiple Layers of Features from Tiny Images”, Technical Report, 2009.
11
Alex Krizhevsky, “Learning Multiple Layers of Features from Tiny Images”, Technical Report, 2009.
12
L1 Distance
L1
distance:
13
Code implementation, using Numpy
Nearest Neighbor classifier
Lecture 2 -
14
Code implementation, using Numpy
Nearest Neighbor classifier
Lecture 2 -
15
Code implementation, using Numpy
Nearest Neighbor classifier
Lecture 2 -
16
Code implementation, using Numpy
Nearest Neighbor classifier
Q: With N
examples, how
fast are training
and prediction?
Lecture 2 -
17
Code implementation, using Numpy
Nearest Neighbor
classifier
Q: With N
examples, how
fast are training
and prediction?
A: Train O(1),
predict
O(N)
Lecture 2 -
18
Code implementation, using Numpy
Nearest Neighbor
classifier
Q: With N
examples, how
fast are training
and prediction?
A: Train O(1),
predict
O(N)
19
K-Nearest Neighbors
Lecture 2 -
20
K-Nearest Neighbors
Lecture 2 -
21
What does it looks like?
Lecture 2 -
22
What does it looks like?
Lecture 2 -
23
K-Nearest Neighbors: Distance Metric
Lecture 2 -
24
K-Nearest Neighbors: Distance Metric
K=1 K=1
Lecture 2 -
25
K-Nearest Neighbors: Demo
http://vision.stanford.edu/teaching/cs231n-demos/knn/
26
K-Nearest Neighbors: Hyperparameters
›Very problem-dependent.
27
Searching for Hyperparameters
Your Dataset
28
Searching for Hyperparameters
Your Dataset
29
Searching for Hyperparameters
Your Dataset
Idea #2: Split data into train and test, choose hyperparameters that work best
on test data
train test
30
Searching for Hyperparameters
Your Dataset
Idea #2: Split data into train and test, choose hyperparameters that work best
on test data
train test
31
Searching for Hyperparameters
Idea #3: Split data into train, val, and test; choose hyperparameters on val
and evaluate on test
32
Searching for Hyperparameters
Idea #4: K-Folds Cross-Validation: Split data into folds, try each fold as
validation and average the results
Useful for small datasets, but not used too frequently in deep learning
33
Searching for Hyperparameters
Idea #5: Nested Cross-Validation: Two Loops, for model selection and
evaluation
34
Searching for Hyperparameters
Example of
5-fold cross-validation
for the value of k.
35
k-Nearest Neighbors‘ Drawbacks
Original image is
CC0 public domain
36
Curse of Dimensionalities
Dimensions = 3
- Curse of Points = 43
dimensionality
Dimensions = 2
Points = 42
Dimensions = 1
Points = 4
Lecture 2 -
37
k-Nearest Neighbor: Summary
In Image classification we start with a training set of images and labels, and
must predict labels on the test set
Choose hyperparameters using the validation set; only run on the test set once at
the very end!
Lecture 2 -
38
Linear Classifiers
Man in black shirt is Construction worker in orange Karpathy and Fei-Fei, “Deep Visual-Semantic Alignments for Generating Image Descriptions”, CVPR 2015
Figures copyright IEEE, 2015. Reproduced for educational purposes.
playing guitar. safety vest is working on road.
Lecture 2 -
39
Recall CIFAR10
40
Parametric Approach
Image
10 numbers giving
f(x,W)
class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights
Lecture 2 -
41
Parametric Approach: Linear Classifier
f(x,W) = Wx
Image
10 numbers giving
f(x,W)
class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights
Lecture 2 -
42
Parametric Approach: Linear Classifier
3072x1
Image f(x,W) = Wx
10x1 10x3072
10 numbers giving
f(x,W) class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights
Lecture 2 -
43
Classification through a linear classifier
56
56 231
231
24 2
24
Input image
2
44
Classification through a linear classifier
56
0.2 -0.5 0.1 2.0 1.1 -96.8 Cat score
56 231
231
24 2
1.5 1.3 2.1 0.0
24
+ =3.2 437.9 Dog score
45
What is a linear Classifier
f(x,W) = Wx
Lecture 2 -
46
What is a linear Classifier
Image with 4 pixels, and 3 classes (cat/dog/ship)
Input image
Algebraic Viewpoint
W
f(x,W) = Wx
0.2 -0.5 1.5 1.3 0 .25
Lecture 2 -
47
Interpreting a linear Classifier
Lecture 2 -
48
Interpreting a linear Classifier
Visual Viewpoint
Lecture 2 -
49
Interpreting a linear Classifier
Geometric Viewpoint
f(x,W) = Wx + b
Plot created using Wolfram Cloud Cat image by Nikita is licensed under CC-BY 2.0
Lecture 2 -
50
Hard cases for a linear classifier
Lecture 2 -
51
Linear Classifier: Three Viewpoints
52
Linear Classifier: Three Viewpoints
f(x,W) = Wx + b
So far: Defined a (linear) score function
Example class
scores for 3
images for
some W: -3.45 -0.51 3.42
-8.87 6.04 4.64
0.09 5.31 2.65
How can we tell 2.9 -4.22 5.1
4.48 -4.19 2.64
whether this W 8.02 3.58 5.55
4.49 -4.34
is good or bad? 3.78
-4.37 -1.5
1.06
-0.36 -2.09 -4.79
-0.72 -2.93 6.14
53
Linear Classifier: TODO List
TODO:
54
Linear Classifier: Scores
cat
car 3.2 1.3 2.2
frog 5.1 4.9 2.5
-1.7 2.0 -3.1
55
Loss Functions
Suppose: 3 training examples, 3 A loss function tells how
classes. With some W the scores are: good our current classifier is
56
Multiclass SVM Loss
Suppose: 3 training examples, 3 Multiclass SVM loss:
classes. With some W the scores are:
Given an example
where is the image and
where is the (integer) label,
57
Multiclass SVM Loss
Suppose: 3 training examples, 3 Multiclass SVM loss:
classes. With some W the scores are:
Given an example
“Hinge loss”
where is the image
and where is the (integer)
label,
58
Multiclass SVM Loss
Suppose: 3 training examples, 3 Multiclass SVM loss:
classes. With some W the scores are:
Given an example
where is the image and
where is the (integer) label,
59
Multiclass SVM Loss
Suppose: 3 training examples, 3 Multiclass SVM loss:
classes. With some W the scores are:
Given an example
where is the image and
where is the (integer) label,
60
Multiclass SVM Loss
Suppose: 3 training examples, 3 Multiclass SVM loss:
classes. With some W the scores are:
Given an example
where is the image and
where is the (integer) label,
61
Multiclass SVM Loss
Suppose: 3 training examples, 3 Multiclass SVM loss:
classes. With some W the scores are:
Given an example
where is the image and
where is the (integer) label,
62
Multiclass SVM Loss
Suppose: 3 training examples, 3 Multiclass SVM loss:
classes. With some W the scores are:
Given an example
where is the image and
where is the (integer) label,
63
Multiclass SVM Loss
Suppose: 3 training examples, 3 Multiclass SVM loss:
classes. With some W the scores are:
Given an example
where is the image and
where is the (integer) label,
64
Multiclass SVM Loss
Suppose: 3 training examples, 3 Multiclass SVM loss:
classes. With some W the scores are:
Given an example
where is the image and
where is the (integer) label,
65
Multiclass SVM Loss
Suppose: 3 training examples, 3 Multiclass SVM loss:
classes. With some W the scores are:
Given an example
where is the image and
where is the (integer) label,
66
Multiclass SVM Loss
Suppose: 3 training examples, 3 Multiclass SVM loss:
classes. With some W the scores are:
Given an example
where is the image and
where is the (integer) label,
67
Multiclass SVM Loss
Suppose: 3 training examples, 3 Multiclass SVM loss:
classes. With some W the scores are:
Given an example
where is the image and
where is the (integer) label,
2.9 0 12.9
Fei-Fei Li & Justin Johnson & Serena Yeung
68
Multiclass SVM Loss
Suppose: 3 training examples, 3 Multiclass SVM loss:
classes. With some W the scores are:
Given an example
where is the image and
where is the (integer) label,
69
Multiclass SVM Loss - Implementation
70
Multiclass SVM Loss
71
Multiclass SVM Loss
72
Multiclass SVM Loss: Parameters Search
Suppose: 3 training examples, 3
classes. With some W the scores are:
Before:
= max(0, 1.3 - 4.9 + 1)
+max(0, 2.0 - 4.9 + 1)
= max(0, -2.6) + max(0, -1.9)
=0+0
=0
73
Regularization
74
Regularization
75
Regularization
76
Regularization
= regularization strength
(hyperparameter)
77
Regularization
= regularization strength
(hyperparameter)
Simple examples
L2 regularization:
L1 regularization:
Elastic net (L1 + L2):
78
Regularization
= regularization strength
(hyperparameter)
79
Regularization
= regularization strength
(hyperparameter)
Why regularize?
- Express preferences over weights
- Make the model simple so it works on test data
- Improve optimization by adding curvature
80
Regularization - Expressing Preferences
81
Regularization - Expressing Preferences
Simple examples
L2 regularization:
L1 regularization:
L2 regularization likes to
“spread out” the weights
82
Regularization – Prefer simpler Models
83
Regularization – Prefer simpler Models
f1
f2
y
84
Regularization – Prefer simpler Models
f1
f2
y
x
Regularization pushes against fitting the data
too well so we don’t fit noise in the data
85
Softmax Classifier
cat
car 3.2
frog 5.1
-1.7
86
Softmax Classifier
Softmax Classifier (Multinomial Logistic
Regression)
Want to interpret raw classifier scores as probabilities
Softmax
Function
cat
car 3.2
frog 5.1
-1.7
87
Softmax Classifier
Softmax Classifier (Multinomial Logistic
Regression)
Want to interpret raw classifier scores as probabilities
Softmax
Function
Probabilities
must be >= 0
cat
car 3.2 24.5
exp
frog 5.1 164.0
-1.7 0.18
unnormalized
probabilities
Fei-Fei Li & Justin Johnson & Serena Yeung
88
Softmax Classifier
Softmax Classifier (Multinomial Logistic
Regression)
Want to interpret raw classifier scores as probabilities
Softmax
Function
Probabilities Probabilities
must be >= 0 must sum to 1
cat
car 3.2 24.5 0.13
exp normalize
frog 5.1 164.0 0.87
-1.7 0.18 0.00
unnormalized probabilities
probabilities
Fei-Fei Li & Justin Johnson & Serena Yeung
89
Softmax Classifier
Softmax Classifier (Multinomial Logistic
Regression)
Want to interpret raw classifier scores as probabilities
Softmax
Function
Probabilities Probabilities
must be >= 0 must sum to 1
cat
car 3.2 24.5 0.13
exp normalize
frog 5.1 164.0 0.87
-1.7 0.18 0.00
Unnormalized unnormalized probabilities
log-probabilities / logits probabilities
Fei-Fei Li & Justin Johnson & Serena Yeung
90
Softmax Classifier
Softmax Classifier (Multinomial Logistic
Regression)
Want to interpret raw classifier scores as probabilities
Softmax
Function
Probabilities Probabilities
must be >= 0 must sum to 1
cat
car 3.2 24.5 0.13 Li = -log(0.13)
exp normalize = 2.04
frog 5.1 164.0 0.87
Maximum Likelihood Estimation
-1.7 0.18 0.00 Choose probabilities to maximize
the likelihood of the observed data
Unnormalized unnormalized (See CS 229 for details)
probabilities
log-probabilities / logits probabilities
Fei-Fei Li & Justin Johnson & Serena Yeung
91
Softmax Classifier
Softmax Classifier (Multinomial Logistic
Regression)
Want to interpret raw classifier scores as probabilities
Softmax
Function
Probabilities Probabilities
must be >= 0 must sum to 1
cat
car 3.2 24.5 0.13 compare
1.00
exp
normalize
frog 5.1 164.0 0.87 0.00
-1.7 0.18 0.00 0.00
Unnormalized unnormalized probabilities Correct
log-probabilities / logits probabilities probs
Fei-Fei Li & Justin Johnson & Serena Yeung April 10, 2018
92
Softmax Classifier
Softmax Classifier (Multinomial Logistic
Regression)
Want to interpret raw classifier scores as probabilities
Softmax
Function
Probabilities Probabilities
must be >= 0 must sum to 1
cat
car 3.2 24.5 0.13 compare
1.00
exp normalize Kullback–Leibler
frog 5.1 164.0 0.87 divergence 0.00
-1.7 0.18 0.00 0.00
Unnormalized unnormalized probabilities Correct
log-probabilities / logits probabilities probs
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018
51
93
Softmax Classifier
Softmax Classifier (Multinomial Logistic
Regression)
Want to interpret raw classifier scores as probabilities
Softmax
Function
Probabilities Probabilities
must be >= 0 must sum to 1
cat
car 3.2 24.5 0.13 compare
1.00
exp normalize
frog 5.1 164.0 0.87 Cross Entropy 0.00
-1.7 0.18 0.00 0.00
Unnormalized unnormalized probabilities Correct
log-probabilities / logits probabilities probs
Fei-Fei Li & Justin Johnson & Serena Yeung
94
Distance Metric - Intuition
95
Distance Metric - Intuition
Where
96
Distance Metric - Intuition
97
Regularization - Intuition
Posterior Distribution,
98
Regularization - Intuition
Posterior Distribution,
Minimize
99
Softmax Classifier
Softmax Classifier (Multinomial Logistic
Regression)
Want to interpret raw classifier scores as probabilities
Softmax
Function
10
Softmax Classifier
Softmax Classifier (Multinomial Logistic
Regression)
Want to interpret raw classifier scores as probabilities
Softmax
Function
10
Softmax Classifier
Softmax Classifier (Multinomial Logistic
Regression)
Want to interpret raw classifier scores as probabilities
Softmax
Function
10
Softmax Classifier
Softmax Classifier (Multinomial Logistic
Regression)
Want to interpret raw classifier scores as probabilities
Softmax
Function
10
Softmax Classifier
Softmax Classifier (Multinomial Logistic
Regression)
Want to interpret raw classifier scores as probabilities
Softmax
Function
10
Softmax vs. SVM
10
Softmax vs. SVM
10
Softmax vs. SVM
10
Recap
Softmax
SVM
Full loss
10
Recap
How do we find the best W?
Softmax
SVM
Full loss
10
Motivation example: Kinect
11
Image classification example
Simafore.com
11
From a spreadsheet to a
decision node
p n
I ( A) = H , ) − EH ( A)
( p+n p+
n largest I
❑ Choose the attribute with the
12
Random Forests algorithm
↓ H↑ ↑
12
Building a forest (ensemble)
12
Loss Function implements a metric
What is a metric?
Visually, a metrix provides a maping between two elements and the elements distance.
12
Grundlage der Masterfolien
Als Grundlage dient der Corporate Design Style Guide der TUM.
Die Präsentationsvorlage ist auf gute Lesbarkeit und klare Darstellung von Informationen
optimiert.
128
Sources
http://cs231n.stanford.edu/syllabus.html
https://www.cs.ubc.ca/~nando/340-2012/lectures.php
https://github.com/jonesgithub/book-1/blob/master/ML%20Machine%20Learning-A%20Probabili
stic%20Perspective.pdf
129