0% found this document useful (0 votes)

20 views129 pages

Machine Learning Algorithms - pptx-1

Uploaded by

张立波

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views129 pages

Machine Learning Algorithms - pptx-1

Uploaded by

张立波

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 129

Machine Learning Algorithms

Lorenzo Servadei, Sebastian Schober, Daniela S Lopera, Wolfgang Ecker

Introduction to Machine Learning Algorithms

2
Table of Contents

Machine Learning Algorithms – an Overview

The Data Driven Approach

K-Nearest Neighbors

Linear Classifier

Loss Functions

Decision Trees and Random Forrest

3
Table of Contents

• Machine Learning Algorithms – an Overview

• The Data Driven Approach
• K-Nearest Neighbors
• Linear Classifier
• Loss Functions
• Decision Trees and Random Forrest

4
Machine Learning Algorithms – an Overview

5
Machine Learning Algorithms – an Overview

Parametric Learning Algorithms

Non-parametric Learning Algorithms

• KNNs
• Decision Trees
• Random Forest
Etc.

6
Problem of hard coded classification

Unlike e.g. sorting a list of numbers,

no obvious way to hard-code the algorithm for

recognizing a cat, or other classes.

7
Edges based method?

Find edges Find corners

John Canny, “A Computational Approach to Edge Detection”, IEEE TPAMI 1986

8
A possible Approach

1. Collect a dataset of images and labels

2. Use Machine Learning to train a classifier
3. Evaluate the classifier on new images
Example training set

9
A possible Approach

Memorize all
data and labels

Predict the label

of the most similar
training image

10
Example Dataset: CIFAR10

10 classes
50,000 training images
10,000 testing images

Alex Krizhevsky, “Learning Multiple Layers of Features from Tiny Images”, Technical Report, 2009.

11
Alex Krizhevsky, “Learning Multiple Layers of Features from Tiny Images”, Technical Report, 2009.

12
L1 Distance

L1
distance:

13
Code implementation, using Numpy
Nearest Neighbor classifier

Lecture 2 -

14
Code implementation, using Numpy
Nearest Neighbor classifier

Memorize training data

Lecture 2 -

15
Code implementation, using Numpy
Nearest Neighbor classifier

For each test image:

Find closest train image
Predict label of nearest image

Lecture 2 -

16
Code implementation, using Numpy
Nearest Neighbor classifier

Q: With N
examples, how
fast are training
and prediction?

Lecture 2 -

17
Code implementation, using Numpy
Nearest Neighbor
classifier

Q: With N
examples, how
fast are training
and prediction?

A: Train O(1),
predict
O(N)

Lecture 2 -

18
Code implementation, using Numpy
Nearest Neighbor
classifier

Q: With N
examples, how
fast are training
and prediction?

A: Train O(1),
predict
O(N)

This is bad: we want classifiers that are fast 2at-

Lecture
prediction; slow for training is ok

19
K-Nearest Neighbors

Lecture 2 -

20
K-Nearest Neighbors

Instead of copying label from nearest neighbor, take majority vote

from K closest points

K=1 K=3 K=5

Lecture 2 -

21
What does it looks like?

Lecture 2 -

22
What does it looks like?

Lecture 2 -

23
K-Nearest Neighbors: Distance Metric

L1 (Manhattan) distance L2 (Euclidean) distance

Lecture 2 -

24
K-Nearest Neighbors: Distance Metric

L1 (Manhattan) distance L2 (Euclidean) distance

K=1 K=1

Lecture 2 -

25
K-Nearest Neighbors: Demo

http://vision.stanford.edu/teaching/cs231n-demos/knn/

26
K-Nearest Neighbors: Hyperparameters

›What is the best value of k to use? What

is the best distance to use?

›These are hyperparameters: choices about the

algorithm that we set rather than learn

›Very problem-dependent.

27
Searching for Hyperparameters

Idea #1: Choose hyperparameters that work best on the data

Your Dataset

28
Searching for Hyperparameters

Idea #1: Choose hyperparameters that work best on the data

BAD: always works perfectly on training
data

Your Dataset

29
Searching for Hyperparameters

Idea #1: Choose hyperparameters that work best on the data

BAD: always works perfectly on training
data

Your Dataset

Idea #2: Split data into train and test, choose hyperparameters that work best
on test data

train test

30
Searching for Hyperparameters

Idea #1: Choose hyperparameters that work best on the data

BAD: always works perfectly on training
data

Your Dataset

Idea #2: Split data into train and test, choose hyperparameters that work best
on test data

train test

BAD: No idea how algorithm

will perform on new data

31
Searching for Hyperparameters

Idea #3: Split data into train, val, and test; choose hyperparameters on val
and evaluate on test

train validation test

32
Searching for Hyperparameters

Idea #4: K-Folds Cross-Validation: Split data into folds, try each fold as
validation and average the results

fold 1 fold 2 fold 3 fold 4 fold 5 test

Useful for small datasets, but not used too frequently in deep learning

33
Searching for Hyperparameters

Idea #5: Nested Cross-Validation: Two Loops, for model selection and
evaluation

34
Searching for Hyperparameters

Example of
5-fold cross-validation
for the value of k.

Each point: single

outcome.

35
k-Nearest Neighbors‘ Drawbacks

k-Nearest Neighbor on images never used.

- Very slow at test time

- Distance metrics on pixels are not informative
Tinted
Original Boxed Shifted

Original image is
CC0 public domain

(all 3 images have same L2 distance to the one on the left)

36
Curse of Dimensionalities

k-Nearest Neighbor on images never used.

Dimensions = 3
- Curse of Points = 43
dimensionality
Dimensions = 2
Points = 42

Dimensions = 1
Points = 4

Lecture 2 -

37
k-Nearest Neighbor: Summary

In Image classification we start with a training set of images and labels, and
must predict labels on the test set

The K-Nearest Neighbors classifier predicts labels based on nearest training

examples

Distance metric and K are hyperparameters

Choose hyperparameters using the validation set; only run on the test set once at
the very end!

Lecture 2 -

38
Linear Classifiers

Two young girls are Boy is doing backflip

playing with lego toy on wakeboard

Man in black shirt is Construction worker in orange Karpathy and Fei-Fei, “Deep Visual-Semantic Alignments for Generating Image Descriptions”, CVPR 2015
Figures copyright IEEE, 2015. Reproduced for educational purposes.
playing guitar. safety vest is working on road.

Lecture 2 -

39
Recall CIFAR10

50,000 training images

each image is 32x32x3

10,000 test images.

40
Parametric Approach

Image

10 numbers giving
f(x,W)
class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights
Lecture 2 -

41
Parametric Approach: Linear Classifier
f(x,W) = Wx

Image

10 numbers giving
f(x,W)
class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights
Lecture 2 -

42
Parametric Approach: Linear Classifier

3072x1

Image f(x,W) = Wx
10x1 10x3072
10 numbers giving
f(x,W) class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights
Lecture 2 -

43
Classification through a linear classifier

Image with 4 pixels, and 3 classes (cat/dog/ship)

Stretch pixels into column

56 231
231

24 2
24

Input image
2

44
Classification through a linear classifier

Image with 4 pixels, and 3 classes (cat/dog/ship)

Stretch pixels into column

56
0.2 -0.5 0.1 2.0 1.1 -96.8 Cat score
56 231
231

24 2
1.5 1.3 2.1 0.0
24
+ =3.2 437.9 Dog score

0 0.25 0.2 -0.3 -1.2 61.95 Ship score

Input image
2
W b
Lecture 2 -

45
What is a linear Classifier

Image with 4 pixels, and 3 classes (cat/dog/ship)

Algebraic Viewpoint

f(x,W) = Wx

Lecture 2 -

46
What is a linear Classifier
Image with 4 pixels, and 3 classes (cat/dog/ship)
Input image
Algebraic Viewpoint

W
f(x,W) = Wx
0.2 -0.5 1.5 1.3 0 .25

0.1 2.0 2.1 0.0 0.2 -0.3

b 1.1 3.2 -1.2

Score -96.8 437.9 61.95

Lecture 2 -

47
Interpreting a linear Classifier

Lecture 2 -

48
Interpreting a linear Classifier
Visual Viewpoint

Lecture 2 -

49
Interpreting a linear Classifier
Geometric Viewpoint

f(x,W) = Wx + b

Array of 32x32x3 numbers

(3072 numbers total)

Plot created using Wolfram Cloud Cat image by Nikita is licensed under CC-BY 2.0

Lecture 2 -

50
Hard cases for a linear classifier

Class 1: Class 1: Class 1:

First and third quadrants 1 <= L2 norm <= 2 Three modes

Class 2: Class 2: Class 2:

Second and fourth quadrants Everything Else Everything Else

Lecture 2 -

51
Linear Classifier: Three Viewpoints

Algebraic Viewpoint Visual Viewpoint Geometric Viewpoint

f(x,W) = Wx Hyperplanes cutting

One template per class
up space

52
Linear Classifier: Three Viewpoints
f(x,W) = Wx + b
So far: Defined a (linear) score function
Example class
scores for 3
images for
some W: -3.45 -0.51 3.42
-8.87 6.04 4.64
0.09 5.31 2.65
How can we tell 2.9 -4.22 5.1
4.48 -4.19 2.64
whether this W 8.02 3.58 5.55
4.49 -4.34
is good or bad? 3.78
-4.37 -1.5
1.06
-0.36 -2.09 -4.79
-0.72 -2.93 6.14

Cat image by Nikita is licensed

Lecture 2 -
under CC-BY 2.0 Car image is
CC0 1.0 public domain
Frog image is in the public domain

53
Linear Classifier: TODO List

TODO:

1. Define a loss function

that quantifies our
unhappiness with the
scores across the training
data.

2. Come up with a way of

efficiently finding the
parameters that minimize
the loss function.
(optimization)
Cat image by Nikita is licensed under CC-BY 2.0; Car image is CC0 1.0 public domain; Frog image is in the public domain

Fei-Fei Li & Justin Johnson & Serena Yeung

54
Linear Classifier: Scores

Suppose: 3 training examples, 3

classes. With some W the scores are:

cat
car 3.2 1.3 2.2
frog 5.1 4.9 2.5
-1.7 2.0 -3.1

Fei-Fei Li & Justin Johnson & Serena Yeung

55
Loss Functions
Suppose: 3 training examples, 3 A loss function tells how
classes. With some W the scores are: good our current classifier is

Given a dataset of examples

cat Where is image and

is (integer) label
car 3.2 1.3 2.2
frog 5.1 4.9 2.5 Loss over the dataset is a
sum of loss over examples:
-1.7 2.0 -3.1

Fei-Fei Li & Justin Johnson & Serena Yeung

56
Multiclass SVM Loss
Suppose: 3 training examples, 3 Multiclass SVM loss:
classes. With some W the scores are:
Given an example
where is the image and
where is the (integer) label,

and using the shorthand for the

scores vector:

cat the SVM loss has the form:

car 3.2 1.3 2.2

frog 5.1 4.9 2.5
-1.7 2.0 -3.1

Fei-Fei Li & Justin Johnson & Serena Yeung

68
Multiclass SVM Loss
Suppose: 3 training examples, 3 Multiclass SVM loss:
classes. With some W the scores are:
Given an example
where is the image and
where is the (integer) label,

and using the shorthand for the

scores vector:

cat the SVM loss has the form:

car 3.2 1.3 2.2

frog 5.1 4.9 2.5 Q6: What if we used

Losses: -1.7 2.0 -3.1

2.9 0 12.9
Fei-Fei Li & Justin Johnson & Serena Yeung

69
Multiclass SVM Loss - Implementation

Fei-Fei Li & Justin Johnson & Serena Yeung

70
Multiclass SVM Loss

E.g. Suppose that we found a W such that L =

0. Is this W unique?

Fei-Fei Li & Justin Johnson & Serena Yeung

71
Multiclass SVM Loss

E.g. Suppose that we found a W such that L =

0. Is this W unique?

No! 2W is also has L = 0!

Fei-Fei Li & Justin Johnson & Serena Yeung

72
Multiclass SVM Loss: Parameters Search
Suppose: 3 training examples, 3
classes. With some W the scores are:
Before:
= max(0, 1.3 - 4.9 + 1)
+max(0, 2.0 - 4.9 + 1)
= max(0, -2.6) + max(0, -1.9)
=0+0
=0

cat 1.3 2.2 With W twice as large:

3.2 4.9 2.5 = max(0, 2.6 - 9.8 + 1)
+max(0, 4.0 - 9.8 + 1)
car 5.1 = max(0, -6.2) + max(0, -4.8)
2.0 -3.1 =0+0
frog
-1.7 =0
2.9 0
Losses:
Fei-Fei Li & Justin Johnson & Serena Yeung

73
Regularization

Data loss: Model predictions

should match training data