0% found this document useful (0 votes)
40 views61 pages

ml19 Part01 Intro

ML - Intro

Uploaded by

Usman Muhammad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views61 pages

ml19 Part01 Intro

ML - Intro

Uploaded by

Usman Muhammad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Machine Learning – Lecture 1

Augmented Computing

Introduction
09.10.2019
Winter ‘19
and Sensory
Learning

Bastian Leibe
RWTH Aachen
Perceptual

http://www.vision.rwth-aachen.de/
Machine

[email protected]
Organization
• Lecturer
 Prof. Bastian Leibe ([email protected])
Augmented Computing

• Assistants
 Ali Athar ([email protected])
 Sabarinath Mahadevan ([email protected])

• Course webpage
Winter ‘19
and Sensory

 http://www.vision.rwth-aachen.de/courses/
 Slides will be made available on the webpage and in moodle
Learning

 Lecture recordings as screencasts will be available via moodle


Perceptual

• Please subscribe to the lecture in rwth online!


Machine

 Important to get email announcements and moodle access!


2
B. Leibe
Language
• Official course language will be English
 If at least one English-speaking student is present.
Augmented Computing

 If not… you can choose.

• However…
 Please tell me when I’m talking too fast or when I should repeat
Winter ‘19

something in German for better understanding!


and Sensory

 You may at any time ask questions in German!


 You may turn in your exercises in German.
Learning

 You may answer exam questions in German.


Perceptual
Machine

3
B. Leibe
Organization
• Structure: 3V (lecture) + 1Ü (exercises)
 6 EECS credits
Augmented Computing

 Part of the area “Applied Computer Science”

• Place & Time


 Lecture/Exercises: Wed 08:30 – 10:00 room HG Aula
 Lecture/Exercises: Thu 14:30 – 16:00 room TEMP2
Winter ‘19

• Exam
and Sensory

 Written exam
Learning

 1st Try TBD TBD


 2nd Try TBD TBD
Perceptual
Machine

4
B. Leibe
Exercises and Supplementary Material
• Exercises
 Typically 1 exercise sheet every 2 weeks.
Augmented Computing

 Pen & paper and programming exercises


– Python for first exercise slots
– TensorFlow for Deep Learning part
 Hands-on experience with the algorithms from the lecture.
 Send your solutions the night before the exercise class.
Need to reach  50% of the points to qualify for the exam!
Winter ‘19


and Sensory

• Teams are encouraged!


Learning

 You can form teams of up to 4 people for the exercises.


 Each team should only turn in one solution via L2P.
Perceptual

But list the names of all team members in the submission.


Machine

5
B. Leibe
Course Webpage
Augmented Computing
Winter ‘19
and Sensory

First exercise
on 24.10.
Learning
Perceptual
Machine

http://www.vision.rwth-aachen.de/courses/
6
B. Leibe
Textbooks
• The first half of the lecture is covered in Bishop’s book.
• For Deep Learning, we will use Goodfellow & Bengio.
Augmented Computing

Christopher M. Bishop
Pattern Recognition and Machine Learning
Springer, 2006

(available in the library’s “Handapparat”)


Winter ‘19

I. Goodfellow, Y. Bengio, A. Courville


and Sensory

Deep Learning
MIT Press, 2016
Learning

• Research papers will be given out for some topics.


Perceptual

Tutorials and deeper introductions.


Machine

 Application papers
7
B. Leibe
How to Find Us
• Office:
 UMIC Research Centre
Augmented Computing

 Mies-van-der-Rohe-Strasse 15, room 124

• Office hours
Winter ‘19
and Sensory

 If you have questions about the lecture, contact Paul or Sabarinath.


 My regular office hours will be announced
Learning

(additional slots are available upon request)


Send us an email before to confirm a time slot.
Perceptual


Machine

Questions are welcome!


8
B. Leibe
Machine Learning
• Statistical Machine Learning
 Principles, methods, and algorithms for learning and prediction on
Augmented Computing

the basis of past evidence

• Already everywhere
 Speech recognition (e.g. Siri)
 Machine translation (e.g. Google Translate)
Winter ‘19

 Computer vision (e.g. Face detection)


and Sensory

 Text filtering (e.g. Email spam filters)


 Operation systems (e.g. Caching)
Learning

 Fraud detection (e.g. Credit cards)


Perceptual

 Game playing (e.g. Alpha Go)


Machine

 Robotics (everywhere)

9
B. Leibe
Slide credit: Bernt Schiele
What Is Machine Learning Useful For?
Learning
Perceptual
Machine Augmented Computing
Winter ‘19
and Sensory

Automatic Speech Recognition

10
B. Leibe
Slide adapted from Zoubin Gharamani
What Is Machine Learning Useful For?
Learning
Perceptual
Machine Augmented Computing
Winter ‘19
and Sensory

Computer Vision
(Object Recognition, Segmentation, Scene Understanding)
11
B. Leibe
Slide adapted from Zoubin Gharamani
What Is Machine Learning Useful For?
Learning
Perceptual
Machine Augmented Computing
Winter ‘19
and Sensory

Information Retrieval
(Retrieval, Categorization, Clustering, ...)
12
B. Leibe
Slide adapted from Zoubin Gharamani
What Is Machine Learning Useful For?
Learning
Perceptual
Machine Augmented Computing
Winter ‘19
and Sensory

Financial Prediction
(Time series analysis, ...)
13
B. Leibe
Slide adapted from Zoubin Gharamani
What Is Machine Learning Useful For?
Learning
Perceptual
Machine Augmented Computing
Winter ‘19
and Sensory

Medical Diagnosis
(Inference from partial observations)
14
B. Leibe Image from Kevin Murphy
Slide adapted from Zoubin Gharamani
What Is Machine Learning Useful For?
Learning
Perceptual
Machine Augmented Computing
Winter ‘19
and Sensory

Bioinformatics
(Modelling gene microarray data,...)
15
B. Leibe
Slide adapted from Zoubin Gharamani
What Is Machine Learning Useful For?
Learning
Perceptual
Machine Augmented Computing
Winter ‘19
and Sensory

Autonomous Driving
(DARPA Grand Challenge,...)
16
B. Leibe Image from Kevin Murphy
Slide adapted from Zoubin Gharamani
Machine Learning
Perceptual Winter ‘19
and Sensory Augmented Computing

B. Leibe
Deep Learning
And you might have heard of…

17
Machine Learning
• Goal
 Machines that learn to perform a task from experience
Augmented Computing

• Why?
 Crucial component of every intelligent/autonomous system
Important for a system’s adaptability
Winter ‘19


and Sensory

 Important for a system’s generalization capabilities


 Attempt to understand human learning
Learning
Perceptual
Machine

18
B. Leibe
Slide credit: Bernt Schiele
Machine Learning: Core Questions
• Learning to perform a task from experience
Augmented Computing

• Learning
 Most important part here!
 We do not want to encode the knowledge ourselves.
 The machine should learn the relevant criteria automatically from
past observations and adapt to the given situation.
Winter ‘19
and Sensory

• Tools
 Statistics
Learning

 Probability theory
Decision theory
Perceptual


Machine

 Information theory
 Optimization theory
19
B. Leibe
Slide credit: Bernt Schiele
Machine Learning: Core Questions
• Learning to perform a task from experience
Augmented Computing

• Task
 Can often be expressed through a mathematical function

𝑦 = 𝑓(𝐱; 𝐰)
 𝐱: Input
Winter ‘19

 𝑦: Output
and Sensory

 𝐰: Parameters (this is what is “learned”)


Learning

• Classification vs. Regression


Regression: continuous 𝑦
Perceptual


Machine

 Classification: discrete 𝑦
– E.g. class membership, sometimes also posterior probability
20
B. Leibe
Slide credit: Bernt Schiele
Example: Regression
• Automatic control of a vehicle
Augmented Computing

𝑓(𝐱; 𝐰)
𝐱 𝑦
Learning
Perceptual
Machine Winter ‘19
and Sensory

21
B. Leibe
Slide credit: Bernt Schiele
Examples: Classification


• Email filtering x [a-z] y [important, spam]
Augmented Computing

• Character recognition
Winter ‘19
and Sensory

• Speech recognition
Learning
Perceptual
Machine

22
B. Leibe
Slide credit: Bernt Schiele
Machine Learning: Core Problems
• Input x:
Augmented Computing

• Features
 Invariance to irrelevant input variations
Selecting the “right” features is crucial
Winter ‘19


and Sensory

 Encoding and use of “domain knowledge”


 Higher-dimensional features are more discriminative.
Learning

• Curse of dimensionality
Perceptual
Machine

 Complexity increases exponentially with number of dimensions.

23
B. Leibe
Slide credit: Bernt Schiele
Machine Learning: Core Questions
• Learning to perform a task from experience
Augmented Computing

• Performance measure: Typically one number


 % correctly classified letters
 % games won
 % correctly recognized words, sentences, answers
Winter ‘19

• Generalization performance
and Sensory

 Training vs. test


“All” data
Learning


Perceptual
Machine

24
B. Leibe
Slide credit: Bernt Schiele
Machine Learning: Core Questions
• Learning to perform a task from experience
Augmented Computing

• Performance: “99% correct classification”


 Of what???
 Characters? Words? Sentences?
 Speaker/writer independent?
 Over what data set?
Winter ‘19

 …
and Sensory

• “The car drives without human intervention 99% of the time


Learning

on country roads”
Perceptual
Machine

25
B. Leibe
Slide adapted from Bernt Schiele
Machine Learning: Core Questions
• Learning to perform a task from experience
Augmented Computing

• What data is available?


 Data with labels: supervised learning
– Images / speech with target labels
– Car sensor data with target steering signal

 Data without labels: unsupervised learning


Winter ‘19

– Automatic clustering of sounds and phonemes


and Sensory

– Automatic clustering of web sites

Some data with, some without labels: semi-supervised learning


Learning


Perceptual

 Feedback/rewards: reinforcement learning


Machine

26
B. Leibe
Slide credit: Bernt Schiele
Machine Learning: Core Questions
• Learning to perform a task from experience
Augmented Computing

• Learning
 Most often learning = optimization
 Search in hypothesis space
 Search for the “best” function / model parameter 𝐰
– I.e. maximize 𝑦 = 𝑓(𝐱; 𝐰) w.r.t. the performance measure
Learning
Perceptual
Machine Winter ‘19
and Sensory

27
B. Leibe
Slide credit: Bernt Schiele
Machine Learning: Core Questions
• Learning is optimization of 𝑦 = 𝑓(𝐱; 𝐰)
Augmented Computing

 𝐰: characterizes the family of functions


 𝐰: indexes the space of hypotheses
 𝐰: vector, connection matrix, graph, …
Learning
Perceptual
Machine Winter ‘19
and Sensory

28
B. Leibe
Slide credit: Bernt Schiele
Course Outline
• Fundamentals
 Bayes Decision Theory
Augmented Computing

 Probability Density Estimation

• Classification Approaches
 Linear Discriminants
 Support Vector Machines
 Ensemble Methods & Boosting
Winter ‘19
and Sensory

• Deep Learning
Foundations
Learning

 Convolutional Neural Networks


Perceptual

 Recurrent Neural Networks


Machine

29
B. Leibe
Topics of This Lecture
• Review: Probability Theory
 Probabilities
Augmented Computing

 Probability densities
 Expectations and covariances

• Bayes Decision Theory


 Basic concepts
 Minimizing the misclassification rate
Minimizing the expected loss
Winter ‘19


and Sensory

 Discriminant functions
Learning
Perceptual
Machine

30
B. Leibe
Probability Theory
and Sensory
Learning Augmented Computing
Winter ‘19

“Probability theory is nothing but common sense reduced


Perceptual

to calculation.”
Machine

Pierre-Simon de Laplace, 1749-1827

31
B. Leibe Image source: Wikipedia
Probability Theory
• Example: apples and oranges
 We have two boxes to pick from.
Augmented Computing

 Each box contains both types of fruit.


 What is the probability of picking an apple?

• Formalization
 Let B  r , b be a random variable for the box we pick.
Let F  a, o be a random variable for the type of fruit we get.
Winter ‘19


and Sensory

 Suppose we pick the red box 40% of the time. We write this as
p( B  r )  0.4 p( B  b)  0.6
Learning

 The probability of picking an apple given a choice for the box is


p( F  a | B  r )  0.25 p( F  a | B  b)  0.75
Perceptual
Machine

 What is the probability of picking an apple?


p( F  a)  ?
32
B. Leibe Image source: C.M. Bishop, 2006
Probability Theory
• More general case
 Consider two random variables
 
X   xi  and Y  y j
Augmented Computing

 Consider N trials and let


nij = #fX = xi ^ Y = yj g
ci = #fX = xig
rj = #fY = yj g
Winter ‘19
and Sensory

• Then we can derive


 Joint probability
Learning
Perceptual

 Marginal probability
Machine

 Conditional probability
33
B. Leibe Image source: C.M. Bishop, 2006
Probability Theory
Augmented Computing

• Rules of probability
 Sum rule
Winter ‘19
and Sensory
Learning

 Product rule
Perceptual
Machine

34
B. Leibe Image source: C.M. Bishop, 2006
The Rules of Probability
• Thus we have
Augmented Computing

Sum Rule

Product Rule
Winter ‘19

• From those, we can derive


and Sensory
Learning

Bayes’ Theorem
Perceptual
Machine

where
35
B. Leibe
Probability Densities
• Probabilities over continuous
variables are defined over their
Augmented Computing

probability density function


(pdf) .
Winter ‘19

• The probability that x lies in the interval (, z ) is given by


and Sensory

the cumulative distribution function


Learning
Perceptual
Machine

36
B. Leibe Image source: C.M. Bishop, 2006
Expectations
• The average value of some function f ( x) under a
probability distribution p( x) is called its expectation
Augmented Computing

discrete case continuous case

• If we have a finite number N of samples drawn from a pdf,


Winter ‘19

then the expectation can be approximated by


and Sensory
Learning

• We can also consider a conditional expectation


Perceptual
Machine

37
B. Leibe
Variances and Covariances
• The variance provides a measure how much variability there
is in around its mean value .
Augmented Computing

• For two random variables x and y, the covariance is defined


by
Winter ‘19
and Sensory

• If x and y are vectors, the result is a covariance matrix


Learning
Perceptual
Machine

38
B. Leibe
Bayes Decision Theory
Augmented Computing
Winter ‘19
and Sensory

Thomas Bayes, 1701-1761


Learning

“The theory of inverse probability is founded upon an


Perceptual

error, and must be wholly rejected.”


Machine

R.A. Fisher, 1925

39
B. Leibe Image source: Wikipedia
Bayes Decision Theory
• Example: handwritten character recognition
Augmented Computing
Winter ‘19
and Sensory

• Goal:
Learning

 Classify a new letter such that the probability of misclassification is


Perceptual

minimized.
Machine

40
B. Leibe
Slide credit: Bernt Schiele Image source: C.M. Bishop, 2006
Bayes Decision Theory
• Concept 1: Priors (a priori probabilities) p  Ck 
What we can tell about the probability before seeing the data.
Augmented Computing

 Example:
Winter ‘19
and Sensory

C1  a p  C1   0.75
C2  b p  C2   0.25
Learning
Perceptual

 p C   1
Machine

• In general: k
k
41
B. Leibe
Slide credit: Bernt Schiele
Bayes Decision Theory
• Concept 2: Conditional probabilities p  x | Ck 
 Let x be a feature vector.
Augmented Computing

 x measures/describes certain properties of the input.


– E.g. number of black pixels, aspect ratio, …
 p(x|Ck) describes its likelihood for class Ck.

p  x | a
Winter ‘19
and Sensory

x
p  x | b
Learning
Perceptual
Machine

x
42
B. Leibe
Slide credit: Bernt Schiele
Bayes Decision Theory
• Example:
Augmented Computing

p  x | a p  x | b

x  15
Winter ‘19
and Sensory

• Question:
Which class?
Learning

 Since p  x | b  is much smaller than p  x | a, the


 decision should be
‘a’ here.
Perceptual
Machine

43
B. Leibe
Slide credit: Bernt Schiele
Bayes Decision Theory
• Example:
Augmented Computing

p  x | a p  x | b

x  25
Winter ‘19
and Sensory

• Question:
Which class?
Learning

 Since p  x | a  is much smaller than p  x | b  , the decision should


be ‘b’ here.
Perceptual
Machine

44
B. Leibe
Slide credit: Bernt Schiele
Bayes Decision Theory
• Example:
Augmented Computing

p  x | a p  x | b

x  20
Winter ‘19
and Sensory

• Question:
Which class?
Learning

 Remember that p(a) = 0.75 and p(b) = 0.25…


Perceptual

 I.e., the decision should be again ‘a’.


Machine

 How can we formalize this?


45
B. Leibe
Slide credit: Bernt Schiele
Bayes Decision Theory
• Concept 3: Posterior probabilities p  Ck | x 
We are typically interested in the a posteriori probability, i.e. the
Augmented Computing

probability of class Ck given the measurement vector x.

• Bayes’ Theorem:
p  x | Ck  p  Ck  p  x | Ck  p  Ck 
p  Ck | x   
p  x  p  x | Ci  p  Ci 
Winter ‘19
and Sensory

• Interpretation
Learning

Likelihood  Prior
Posterior 
Perceptual

Normalization Factor
Machine

46
B. Leibe
Slide credit: Bernt Schiele
Bayes Decision Theory
p  x | a p  x | b Likelihood
Augmented Computing

x
p  x | a  p(a)
p  x | b  p(b) Likelihood £ Prior
Winter ‘19

x
and Sensory

Decision boundary
Learning

p a | x p b | x  Likelihood £ Prior
Perceptual

Posterior =
NormalizationFactor
Machine

x
47
B. Leibe
Slide credit: Bernt Schiele
Bayesian Decision Theory
• Goal: Minimize the probability of a misclassification
Augmented Computing

The green and blue


regions stay constant.

Only the size of the


red region varies!
Learning
Perceptual Winter ‘19
and Sensory

Z Z
Machine

= p(C2 jx)p(x)dx + p(C1 jx)p(x)dx


R1 R2
48
B. Leibe Image source: C.M. Bishop, 2006
Bayes Decision Theory
• Optimal decision rule
 Decide for C1 if
Augmented Computing

p(C1jx) > p(C2jx)


 This is equivalent to
p(xjC1)p(C1) > p(xjC2)p(C2)
Winter ‘19
and Sensory

 Which is again equivalent to (Likelihood-Ratio test)

p(xjC1 ) p(C2 )
Learning

>
p(xjC2 ) p(C1 )
Perceptual
Machine

Decision threshold 
49
B. Leibe
Slide credit: Bernt Schiele
Generalization to More Than 2 Classes
• Decide for class k whenever it has the greatest posterior
probability of all classes:
Augmented Computing

p(Ck jx) > p(Cj jx) 8j 6= k

p(xjCk )p(Ck ) > p(xjCj )p(Cj ) 8j 6= k


Winter ‘19
and Sensory

• Likelihood-ratio test
Learning

p(xjCk ) p(Cj )
> 8j 6= k
Perceptual

p(xjCj ) p(Ck )
Machine

50
B. Leibe
Slide credit: Bernt Schiele
Classifying with Loss Functions
• Generalization to decisions with a loss function
 Differentiate between the possible decisions and the possible true
Augmented Computing

classes.
 Example: medical diagnosis
– Decisions: sick or healthy (or: further examination necessary)
– Classes: patient is sick or healthy

 The cost may be asymmetric:


Winter ‘19

loss(decision = healthyjpatient = sick) >>


and Sensory

loss(decision = sickjpatient = healthy)


Learning
Perceptual
Machine

51
B. Leibe
Slide credit: Bernt Schiele
Classifying with Loss Functions
• In general, we can formalize this by introducing a
loss matrix Lkj
Augmented Computing

Lkj = loss for decision Cj if truth is Ck :

• Example: cancer diagnosis


Winter ‘19
and Sensory

Decision
Learning

Truth

Lcancer =
Perceptual

diagnosis
Machine

52
B. Leibe
Classifying with Loss Functions
• Loss functions may be different for different actors.
“don’t
Augmented Computing

“invest”
 Example: invest”
µ ¶
¡ 12 cgain 0
Lstocktrader (subprime) =
0 0

µ ¶
Winter ‘19

¡ 12 cgain 0
and Sensory

Lbank (subprime) =
0
Learning

 Different loss functions may lead to different Bayes optimal


Perceptual

strategies.
Machine

53
B. Leibe
Minimizing the Expected Loss
• Optimal solution is the one that minimizes the loss.
 But: loss function depends on the true class, which is unknown.
Augmented Computing

• Solution: Minimize the expected loss


Winter ‘19

• This can be done by choosing the regions Rj such that


and Sensory
Learning

which is easy to do once we know the posterior class


Perceptual
Machine

probabilities p(Ck jx).


54
B. Leibe
Minimizing the Expected Loss
• Example:
 2 Classes: C1, C2
Augmented Computing

 2 Decision: ®1, ®2
 Loss function: L(®j jCk ) = Lkj
 Expected loss (= risk R) for the two decisions:
Winter ‘19
and Sensory

• Goal: Decide such that expected loss is minimized


Learning

 I.e. decide ®1 if
Perceptual
Machine

55
B. Leibe
Slide credit: Bernt Schiele
Minimizing the Expected Loss
R(®2 jx) > R(®1 jx)
L12 p(C1 jx) + L22 p(C2 jx) > L11 p(C1 jx) + L21 p(C2 jx)
Augmented Computing

(L12 ¡ L11)p(C1jx) > (L21 ¡ L22)p(C2jx)


(L12 ¡ L11 ) p(C2 jx) p(xjC2 )p(C2 )
> =
(L21 ¡ L22 ) p(C1 jx) p(xjC1 )p(C1 )
Winter ‘19

p(xjC1 ) (L21 ¡ L22 ) p(C2 )


and Sensory

>
p(xjC2 ) (L12 ¡ L11 ) p(C1 )
Learning

 Adapted decision rule taking into account the loss.


Perceptual
Machine

56
B. Leibe
Slide credit: Bernt Schiele
The Reject Option
Augmented Computing
Winter ‘19
and Sensory

• Classification errors arise from regions where the largest


posterior probability p(Ck jx) is significantly less than 1.
Learning

 These are the regions where we are relatively uncertain about class
Perceptual

membership.
Machine

 For some applications, it may be better to reject the automatic


decision entirely in such a case and e.g. consult a human expert.
57
B. Leibe Image source: C.M. Bishop, 2006
Discriminant Functions
• Formulate classification in terms of comparisons
 Discriminant functions
Augmented Computing

y1(x); : : : ; yK (x)
 Classify x as class Ck if
yk (x) > yj (x) 8j 6= k
• Examples (Bayes Decision Theory)
Winter ‘19
and Sensory

yk (x) = p(Ck jx)


Learning

yk (x) = p(xjCk )p(Ck )


Perceptual

yk (x) = log p(xjCk ) + log p(Ck )


Machine

58
B. Leibe
Slide credit: Bernt Schiele
Different Views on the Decision Problem
• yk (x) / p(xjCk )p(Ck )
 First determine the class-conditional densities for each class
Augmented Computing

individually and separately infer the prior class probabilities.


 Then use Bayes’ theorem to determine class membership.

 Generative methods

• yk (x) = p(Ck jx)


 First solve the inference problem of determining the posterior class
Winter ‘19

probabilities.
and Sensory

 Then use decision theory to assign each new x to its class.

 Discriminative methods
Learning

• Alternative
Perceptual
Machine

 Directly find a discriminant function yk (x)which maps each input x


directly onto a class label.
59
B. Leibe
Next Lectures…
• Ways how to estimate the probability densities p(xjCk )
 Non-parametric methods
Augmented Computing

3
– Histograms
N = 10
2

– k-Nearest Neighbor 1

– Kernel Density Estimation 0


0 0.5 1

 Parametric methods
– Gaussian distribution
– Mixtures of Gaussians
Winter ‘19
and Sensory

• Discriminant functions
Learning

 Linear discriminants
 Support vector machines
Perceptual
Machine

 Next lectures…
60
B. Leibe
References and Further Reading
• More information, including a short review of Probability
theory and a good introduction in Bayes Decision Theory
Augmented Computing

can be found in Chapters 1.1, 1.2 and 1.5 of

Christopher M. Bishop
Pattern Recognition and Machine Learning
Springer, 2006
Learning
Perceptual
Machine Winter ‘19
and Sensory

61
B. Leibe

You might also like