0% found this document useful (0 votes)

68 views

CS195-5: Introduction To Machine Learning: Greg Shakhnarovich

This document provides an overview of Lecture 5 of the course CS195-5 Introduction to Machine Learning. It reviews concepts related to Gaussians such as covariance, correlation, and covariance matrices. It introduces the topic of classification and discusses representing classification as a regression problem. It also covers the geometry of linear classifiers and projections onto lines. The lecture discusses these concepts and their applications to machine learning.

Uploaded by

satyabasha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views

CS195-5: Introduction To Machine Learning: Greg Shakhnarovich

Uploaded by

satyabasha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

CS195-5 : Introduction to Machine Learning

Lecture 5
Greg Shakhnarovich
September 15 2006
Revised October 24th, 2006
Announcements
Collaboration policy on Psets
Projects
Clarications for Problem Set 1
CS195-5 2006 Lecture 5 1
The correlation question
N values in each of two samples:
e
i
= y
i
w
T
x the prediction error
z
i
= a
T
x
i
a linear function evaluated on the training examples.
Show that ({e
i
}, {z
i
}) = 0.
Develop an intuition, before you attack the derivation: Play with these in
Matlab!
Generate a random w

, random X
Compute Xw

, generate and add Gaussian noise

Fit w, calculate {e
i
}
Generate a random a, calculate {z
i
}. plot them!
Calculate correlation.
CS195-5 2006 Lecture 5 2
More notation
A B means A is dened by B (rst time A is introduced)
A B for varying A and/or B means they are always equal.
E.g., f(x) 1 means f returns 1 regardless of the input x.
a p(a) random variable a is drawn from density p(a)
CS195-5 2006 Lecture 5 3
Review
Uncertainty in w as an estimate of w

:
w N
_
w; w

,
2
(X
T
X)
1
_
Generalized linear regression
f(x; w) = w
0
+ w
1

1
(x) + w
2

2
(x) + . . . + w
m

m
(x)
Multivariate Gaussians
CS195-5 2006 Lecture 5 4
Today
More on Gaussians
Introduction to classication
Projections
Linear discriminant analysis
CS195-5 2006 Lecture 5 5
Refresher on probability
Variance of a r.v. a:
2
a
= E
_
(a
a
)
2

, where
a
= E [a].
Standard deviation:
_

2
a
. Measures the spread around the mean.
Generalization to two variables: covariance
Cov
a,b
E
p(a,b)
[(a
a
)(b
b
)]
Measures how the two variable deviate together from their means (co-vary).
CS195-5 2006 Lecture 5 6
Correlation and covariance
Correlation:
cor(a, b)
Cov
a,b

b
.
1.5 1 0.5 0 0.5 1 1.5
1.5
1
0.5
0
0.5
1
1.5
a
b
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
a
b
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
a
b
cor(a, b) measures the linear relationship between a and b.
1 cor(a, b) +1 ; +1 or 1 means a is a linear function of b.
CS195-5 2006 Lecture 5 7
Covariance matrix
For a random vector x = [x
1
, . . . , x
d
]
T
,
Cov
x

_

2
x
1
Cov
x
1
,x
2
. . . . . . . Cov
x
1
,x
d
Cov
x
2
,x
1

2
x
2
. . . . . . . Cov
x
2
,x
d
.
.
.
.
.
.
.
.
.
Cov
x
d
,x
1
Cov
x
d
,x
2
. . . . .
2
x
d
_

_
.
Square, symmetric, non-negative main diagonal (variances 0)
Under that denition, one can show:
Cov
x
= E
_
(x
x
)(x
x
)
T

i.e. expectation of the outer product of x

x
with itself.
Note: so far nothing Gaussian-specic!
CS195-5 2006 Lecture 5 8
Covariance matrix decomposition
Any covariance matrix can be decomposed:
= R
_
_

1
.
.
.

d
_
_
R
T
where R is a rotation matrix, and
j
0 for all j = 1, . . . , d.
Rotation in 2D:
R =
_
cos() sin()
sin() cos()
_
CS195-5 2006 Lecture 5 9
Rotation matrices
= R
_
_

1
.
.
.

d
_
_
R
T
Rotation matrix R:
orthonormal: if columns are r
1
, . . . , r
d
, then r
T
i
r
i
= 1, r
T
i
r
j
= 0 for i = j.
From here follows R
T
= R
1
(R
T
reverses the rotation produced by R).
Columns r
i
specify the basis for the new (rotated) coordinate system.
R determines the orientation of the ellipse (so called principal directions)
The inner diag(
1
, . . . ,
d
) species the scaling along each of the principal
directions.
Interpretation of the whole product: rotate, scale, and rotate back.
CS195-5 2006 Lecture 5 10
Covariance and correlation for Gaussians
Suppose (for simplicity) = 0. What happens if we rotate the data by R
T
?
The new covariance matrix is just
_

2
x
1
Cov
x
1
,x
2
. . . . . . . Cov
x
1
,x
d
Cov
x
2
,x
1

2
x
2
. . . . . . . Cov
x
2
,x
d
.
.
.
.
.
.
.
.
.
Cov
x
d
,x
1
Cov
x
d
,x
2
. . . . .
2
x
d
_

_
=
_
_

1
.
.
.

d
_
_
The components of x are now uncorrelated (covariances are zero). This is
known as whitening transformation.
For Gaussians, this also means they are independent.
Not true for all distributions!
CS195-5 2006 Lecture 5 11
Classication versus regression
Formally: just like in regression, we want to learn a mapping from X to Y, but
Y is discrete and nite.
One approach is to (navely) ignore that Y is such.
Regression on the indicator matrix:
Code the possible values of the label as 1, . . . , C.
Dene matrix Y:
Y
ij
=
_
1 if y
i
= c,
0 otherwise
This denes C independent regression problems; solving them with least
squares yields

Y
0
= X
0
(X
T
X)
1
XY.
CS195-5 2006 Lecture 5 12
Classication as regression
Suppose we have a binary problem, y {1, 1}.
Assuming the standard model y = f(x; w) + , and solving with least squares,
we get w.
This corresponds to squared loss as a measure of classication performance!
Does this make sense?
CS195-5 2006 Lecture 5 13
Classication as regression
Suppose we have a binary problem, y {1, 1}.
Assuming the standard model y = f(x; w) + , and solving with least squares,
we get w.
This corresponds to squared loss as a measure of classication performance!
Does this make sense?
How do we decide on the label based on f(x; w)?
CS195-5 2006 Lecture 5 13
Classication as regression: example
A 1D example:
x
CS195-5 2006 Lecture 5 14
Classication as regression: example
A 1D example:
x
y
+1
-1
CS195-5 2006 Lecture 5 14
Classication as regression: example
A 1D example:
x
y
+1
-1
w
0
+ w
T
x
CS195-5 2006 Lecture 5 14
Classication as regression: example
A 1D example:
x
y
+1
-1
w
0
+ w
T
x
y = 1
y = +1
CS195-5 2006 Lecture 5 14
Classication as regression
f(x; w) = w
0
+ w
T
x
Cant just take y = f(x; w) since it wont be a valid label.
A reasonable decision rule:
decide on y = 1 if f(x; w) 0, otherwise y = 1.
y = sign
_
w
0
+ w
T
x
_
This species a linear classier :
The linear decision boundary (hyperplane) given by the equation w
0
+ w
T
x = 0
separates the space into two half-spaces.
CS195-5 2006 Lecture 5 15
Classication as regression
Seems to work well here but not so well here?
CS195-5 2006 Lecture 5 16
Geometry of projections
x
2
w
0
+ w
T
x = 0
x
1
w
T
x = 0: a line passing through
the origin and orthogonal to w
w
T
x+w
0
= 0 shifts the line along
w.
CS195-5 2006 Lecture 5 17
Geometry of projections
x
2
w
0
+ w
T
x = 0
x
1
w
w
0
w
w
T
x = 0: a line passing through
the origin and orthogonal to w
w
T
x+w
0
= 0 shifts the line along
w.
CS195-5 2006 Lecture 5 17
Geometry of projections
x
2
w
0
+ w
T
x = 0
x
1
w
w
0
w
x
0
w
T
x = 0: a line passing through
the origin and orthogonal to w
w
T
x+w
0
= 0 shifts the line along
w.
CS195-5 2006 Lecture 5 17
Geometry of projections
x
2
w
0
+ w
T
x = 0
x
1
w
w
0
w
x
0
w
0
+w
T
x
0
w
x
0
w
T
x = 0: a line passing through
the origin and orthogonal to w
w
T
x+w
0
= 0 shifts the line along
w.
CS195-5 2006 Lecture 5 17
Geometry of projections
x
2
w
0
+ w
T
x = 0
x
1
w
w
0
w
x
0
w
0
+w
T
x
0
w
x
0
x

0
w
T
x = 0: a line passing through
the origin and orthogonal to w
w
T
x+w
0
= 0 shifts the line along
w.
x

is the projection of x on w.
CS195-5 2006 Lecture 5 17
Geometry of projections
x
2
w
0
+ w
T
x = 0
x
1
w
w
0
w
x
0
w
0
+w
T
x
0
w
x
0
x

0
w
T
x = 0: a line passing through
the origin and orthogonal to w
w
T
x+w
0
= 0 shifts the line along
w.
x

is the projection of x on w.
Set up a new 1D coordinate system: x (w
0
+x
T
x)/w.
CS195-5 2006 Lecture 5 17
Distribution in 1D projection
Consider a projection given by w
T
x = 0 (i.e., w is the normal)
Each training point x
i
is projected to a scalar z
i
= w
T
x.
We can study how well the projected values corresponding to dierent classes
are separated
This is a function of w; some projections may be better than others.
CS195-5 2006 Lecture 5 18
Linear discriminant and dimensionality reduction
The discriminant function f(x; w) = w
0
+w
T
x reduces the dimension of
examples from d to 1:
f(x, w) = 1
f(x, w) = 0
f(x, w) = +1
w
CS195-5 2006 Lecture 5 19
Projections and classication
What objecive are we optimizing the 1D projection for?
CS195-5 2006 Lecture 5 20
1D projections of a Gaussian
Let p(x) = N (x; , ).
For any A, p(Ax) = N
_
Ax; A, AA
T
_
.
To get a marginal of 1D projection on the direction dened by a unit vector v:
Make R a rotation such that R[1, 0, . . . , 0]
T
= v
Compute
v
= v
T
v; thats the variance of the marginal.
Lets assume for now = 0 (but think what happens if its not!)
Matlab demo: margGausDemo.m
CS195-5 2006 Lecture 5 21
Objective: class separation
We want to minimize overlap between projections of the two classes.
One way to approach that: make the class projections a) compact, b) far apart.
CS195-5 2006 Lecture 5 22
Next time
Continue with linear discriminant analysis, and talk about optimal way to place
the decision boundary.
CS195-5 2006 Lecture 5 23

Information Technology For The Health Professions. 5th Edition. ISBN 0134877713, 978-0134877716
100% (23)
Information Technology For The Health Professions. 5th Edition. ISBN 0134877713, 978-0134877716
23 pages
ML PPT 2
No ratings yet
ML PPT 2
206 pages
Lecture 1 - Overview of Supervised Learning
No ratings yet
Lecture 1 - Overview of Supervised Learning
133 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
Math For Data Science
No ratings yet
Math For Data Science
507 pages
Data Science Using r Programming_data Science Using r Unit 1-5
No ratings yet
Data Science Using r Programming_data Science Using r Unit 1-5
25 pages
Stat Modelling Notes
No ratings yet
Stat Modelling Notes
49 pages
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
No ratings yet
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
17 pages
Applied Multivariate Statistics - Review
No ratings yet
Applied Multivariate Statistics - Review
26 pages
W8-Supervised Learning Methods
No ratings yet
W8-Supervised Learning Methods
30 pages
1 Intro
No ratings yet
1 Intro
5 pages
Lecture 3 Introduction to Linear Algebra (Part 2)
No ratings yet
Lecture 3 Introduction to Linear Algebra (Part 2)
57 pages
1 An Introduction To Linear Classifiers
No ratings yet
1 An Introduction To Linear Classifiers
9 pages
Introduction: Geometric Models: - Page 1 of 25
No ratings yet
Introduction: Geometric Models: - Page 1 of 25
25 pages
10_SVM (1)
No ratings yet
10_SVM (1)
77 pages
06a Math Essentials 2
No ratings yet
06a Math Essentials 2
22 pages
ml-4
No ratings yet
ml-4
101 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
A Journey From Linear Algebra To Machine Learning
No ratings yet
A Journey From Linear Algebra To Machine Learning
50 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
MLF Combined
No ratings yet
MLF Combined
84 pages
Statistics
100% (1)
Statistics
515 pages
Machine Learning Guide
No ratings yet
Machine Learning Guide
185 pages
ML Module 02
No ratings yet
ML Module 02
37 pages
lecture7_LinearFilters
No ratings yet
lecture7_LinearFilters
22 pages
06a Math Essentials 2
No ratings yet
06a Math Essentials 2
22 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
ML_MODULE2
No ratings yet
ML_MODULE2
59 pages
Math Behind Machine Learning
No ratings yet
Math Behind Machine Learning
9 pages
Fitting A Model To Data
No ratings yet
Fitting A Model To Data
41 pages
2016 STATS 302 Course Book
No ratings yet
2016 STATS 302 Course Book
160 pages
Econometrics I Lecture 3 Wooldridge
No ratings yet
Econometrics I Lecture 3 Wooldridge
50 pages
Lecture Notes On Pattern Recognition and Image Processing
No ratings yet
Lecture Notes On Pattern Recognition and Image Processing
24 pages
Andrew Rosenberg - Lecture 1.1: Introduction CSC 84020 - Machine Learning
No ratings yet
Andrew Rosenberg - Lecture 1.1: Introduction CSC 84020 - Machine Learning
33 pages
Module 2 Rnsit
No ratings yet
Module 2 Rnsit
15 pages
CH 2
No ratings yet
CH 2
121 pages
Berkeley Machine Learning
No ratings yet
Berkeley Machine Learning
185 pages
Module 4_chapter 2
No ratings yet
Module 4_chapter 2
14 pages
Example: Anscombe's Quartet Revisited: CC-BY-SA-3.0 GFDL
No ratings yet
Example: Anscombe's Quartet Revisited: CC-BY-SA-3.0 GFDL
10 pages
Statistical Learning
No ratings yet
Statistical Learning
31 pages
RigNotes15 PDF
No ratings yet
RigNotes15 PDF
130 pages
Feature Extraction
No ratings yet
Feature Extraction
90 pages
ML Module-02
No ratings yet
ML Module-02
37 pages
8.predictive Analytics - Classification 2
No ratings yet
8.predictive Analytics - Classification 2
28 pages
Lecture 2
No ratings yet
Lecture 2
37 pages
Lecture Notes on High Dimensional Linear Regression
No ratings yet
Lecture Notes on High Dimensional Linear Regression
73 pages
04 LinearRegression
No ratings yet
04 LinearRegression
61 pages
CS550 Lec2
No ratings yet
CS550 Lec2
24 pages
Linear Algebra
No ratings yet
Linear Algebra
21 pages
Support Vector Machines For Classification and Regression: Steve R. Gunn
No ratings yet
Support Vector Machines For Classification and Regression: Steve R. Gunn
66 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
74 pages
Ismaykim1 PDF
No ratings yet
Ismaykim1 PDF
522 pages
Practical 1
No ratings yet
Practical 1
4 pages
Interview questions companie
No ratings yet
Interview questions companie
72 pages
Stanford ML
No ratings yet
Stanford ML
168 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
Lecture 1, Part 2: Linear Classification: Roger Grosse
No ratings yet
Lecture 1, Part 2: Linear Classification: Roger Grosse
10 pages
Stat444 Notes
No ratings yet
Stat444 Notes
37 pages
Anderson_M2B_Lesson_3
No ratings yet
Anderson_M2B_Lesson_3
18 pages
Machine Learning
100% (1)
Machine Learning
185 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Lect11-Semiconductor Lasers and Light-Emitting Diodes
No ratings yet
Lect11-Semiconductor Lasers and Light-Emitting Diodes
182 pages
Lect1 General Background
No ratings yet
Lect1 General Background
124 pages
Lecture 5
No ratings yet
Lecture 5
25 pages
Linear Algebra For Computer Vision - Part 2: CMSC 828 D
No ratings yet
Linear Algebra For Computer Vision - Part 2: CMSC 828 D
23 pages
RF Acceleration in RF Acceleration in Linacs Linacs
No ratings yet
RF Acceleration in RF Acceleration in Linacs Linacs
32 pages
Lect12 Photodiode Detectors
No ratings yet
Lect12 Photodiode Detectors
80 pages
RF Acceleration in RF Acceleration in Linacs Linacs
No ratings yet
RF Acceleration in RF Acceleration in Linacs Linacs
44 pages
Lecture 4
No ratings yet
Lecture 4
9 pages
Lecture 4
No ratings yet
Lecture 4
9 pages
Topics To Be Covered: - Elements of Step-Growth Polymerization - Branching Network Formation
No ratings yet
Topics To Be Covered: - Elements of Step-Growth Polymerization - Branching Network Formation
43 pages
Chem 373 - Lecture 5: Eigenvalue Equations and Operators
No ratings yet
Chem 373 - Lecture 5: Eigenvalue Equations and Operators
21 pages
Elementary Data Structures: Steven Skiena
No ratings yet
Elementary Data Structures: Steven Skiena
25 pages
4 Lecture 4 (Notes: J. Pascaleff) : 4.1 Geometry of V V
No ratings yet
4 Lecture 4 (Notes: J. Pascaleff) : 4.1 Geometry of V V
5 pages
Lecture 3
No ratings yet
Lecture 3
18 pages
Image Analysis: Pre-Processing of Affymetrix Arrays
No ratings yet
Image Analysis: Pre-Processing of Affymetrix Arrays
14 pages
Image Formation in Man and Machines
No ratings yet
Image Formation in Man and Machines
45 pages
Lecture 3
No ratings yet
Lecture 3
4 pages
Program Analysis: Steven Skiena
No ratings yet
Program Analysis: Steven Skiena
20 pages
Lecture 2
No ratings yet
Lecture 2
3 pages
Asymptotic Notation: Steven Skiena
No ratings yet
Asymptotic Notation: Steven Skiena
17 pages
Brightness: and C
No ratings yet
Brightness: and C
39 pages
Genomic Signal Processing: Classification of Disease Subtype Based On Microarray Data
No ratings yet
Genomic Signal Processing: Classification of Disease Subtype Based On Microarray Data
26 pages
A Christopher Hitchens Bookshelf-9
No ratings yet
A Christopher Hitchens Bookshelf-9
2 pages
History British I 20 Wil S Goog
No ratings yet
History British I 20 Wil S Goog
709 pages
Lecture 2
No ratings yet
Lecture 2
15 pages
Protecting The Nation's Marine Wealth in The West Philippine Sea by Justice Carpio
No ratings yet
Protecting The Nation's Marine Wealth in The West Philippine Sea by Justice Carpio
6 pages
RELEASE LETTER
No ratings yet
RELEASE LETTER
2 pages
Overview of Offshore Production
No ratings yet
Overview of Offshore Production
9 pages
Matlab Odd Workbook- 2022-2023 (1)
No ratings yet
Matlab Odd Workbook- 2022-2023 (1)
60 pages
SWOD2024 Primer FINAL Student
No ratings yet
SWOD2024 Primer FINAL Student
42 pages
MAPEH G10 Q3 TQi
No ratings yet
MAPEH G10 Q3 TQi
3 pages
Experiment 2 ME-363
No ratings yet
Experiment 2 ME-363
8 pages
Download Full (Ebook) Kingship, Madness, and Masculinity on the Early Modern Stage: Mad World, Mad Kings by Christina Gutierrez-Dennehy (editor) ISBN 9780367760830, 0367760835 PDF All Chapters
100% (5)
Download Full (Ebook) Kingship, Madness, and Masculinity on the Early Modern Stage: Mad World, Mad Kings by Christina Gutierrez-Dennehy (editor) ISBN 9780367760830, 0367760835 PDF All Chapters
81 pages
Best Interior Design Companies in Abu Dhabi
No ratings yet
Best Interior Design Companies in Abu Dhabi
5 pages
Making_Weight_strategy
No ratings yet
Making_Weight_strategy
3 pages
Automotive Paint Booth Equipment
No ratings yet
Automotive Paint Booth Equipment
3 pages
KS4 Biology: Digestion Part One
100% (1)
KS4 Biology: Digestion Part One
44 pages
Digital Banking Research Paper by Sufyan Shaikh
No ratings yet
Digital Banking Research Paper by Sufyan Shaikh
67 pages
The_Echoes_of_Forgotten_Dreams_Corrected_Version
No ratings yet
The_Echoes_of_Forgotten_Dreams_Corrected_Version
3 pages
Internship Report
No ratings yet
Internship Report
23 pages
James Chadwick
No ratings yet
James Chadwick
10 pages
7 Fuel Failure in Water Reactors - Causes and Mitigation
No ratings yet
7 Fuel Failure in Water Reactors - Causes and Mitigation
165 pages
Faiths and Pantheons
100% (1)
Faiths and Pantheons
9 pages
Drilling Engineering Neal Adams-264-270
No ratings yet
Drilling Engineering Neal Adams-264-270
7 pages
Marksoc Short Interview of Marketing Professionals
No ratings yet
Marksoc Short Interview of Marketing Professionals
2 pages
Social Media and Body
No ratings yet
Social Media and Body
3 pages
Improving Survival and Storage Stability of Bacteria Recalcitrant To Freeze-Drying: A Coordinated Study by European Culture Collections
No ratings yet
Improving Survival and Storage Stability of Bacteria Recalcitrant To Freeze-Drying: A Coordinated Study by European Culture Collections
14 pages
Mathematics: Examinations Nationalcommon Paper P RT B: Englishand
No ratings yet
Mathematics: Examinations Nationalcommon Paper P RT B: Englishand
20 pages
Frequency of Scores Item Analysis
No ratings yet
Frequency of Scores Item Analysis
9 pages
APTIS V2
No ratings yet
APTIS V2
20 pages
Warisan Bud Living M
No ratings yet
Warisan Bud Living M
12 pages
Recommended Practices For Handling Sodium Hydroxide Solution and Potassium Hydroxide Solution (Caustic) Barges
No ratings yet
Recommended Practices For Handling Sodium Hydroxide Solution and Potassium Hydroxide Solution (Caustic) Barges
33 pages
Scribdpart 3
No ratings yet
Scribdpart 3
2 pages
Minutes 23 JUNE 21 2022 Final
No ratings yet
Minutes 23 JUNE 21 2022 Final
24 pages

CS195-5: Introduction To Machine Learning: Greg Shakhnarovich

Uploaded by

CS195-5: Introduction To Machine Learning: Greg Shakhnarovich

Uploaded by

CS195-5 : Introduction to Machine Learning

, generate and add Gaussian noise

i.e. expectation of the outer product of x

You might also like