0% found this document useful (0 votes)
50 views

Lecture Notes Lecture 2 Basic Linear Algebra Matlab

Overfitting the training data! - The "wiggly" decision boundary fit the training data perfectly but did not generalize to new data. - To generalize better, need a simpler decision boundary that does not overfit the small variations in the training data. - Could try regularizing the decision boundary to be smoother, or collecting more training data. The key lessons are: 1) Combining multiple features can improve classification accuracy. 2) Need to avoid overfitting the training data to ensure the classifier generalizes to new examples. 3) Collecting more varied training data and regularization techniques help address overfitting.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

Lecture Notes Lecture 2 Basic Linear Algebra Matlab

Overfitting the training data! - The "wiggly" decision boundary fit the training data perfectly but did not generalize to new data. - To generalize better, need a simpler decision boundary that does not overfit the small variations in the training data. - Could try regularizing the decision boundary to be smoother, or collecting more training data. The key lessons are: 1) Combining multiple features can improve classification accuracy. 2) Need to avoid overfitting the training data to ensure the classifier generalizes to new examples. 3) Collecting more varied training data and regularization techniques help address overfitting.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

lOMoARcPSD|17897360

Lecture notes, lecture 2 - Basic Linear Algebra - Matlab

Artificial Inteligence II (The University of Western Ontario)

StuDocu is not sponsored or endorsed by any college or university


Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)
lOMoARcPSD|17897360

CS4442/9542b
Artificial Intelligence II
Prof. Olga Veksler

Lecture 2
Introduction to ML
Basic Linear Algebra
Matlab
Some slides on Linear Algebra are from Patrick Nichols
Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)
lOMoARcPSD|17897360

Outline

• Introduction to Machine Learning


• Basic Linear Algebra
• Matlab Intro

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Intro: What is Machine Learning?


• How to write a computer program that automatically
improves its performance through experience
• Machine learning is useful when it is too difficult to
come up with a program to perform a desired task
• Make computer to learn by showing examples (most
frequently with correct answers)
• “supervised” learning or learning with a teacher
• In practice: computer program (or function) which has
a tunable parameters, tune parameters until the
desirable behavior on the examples

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Different Types of Learning


• Learning from examples:
• Supervised Learning: given training examples of
inputs and corresponding outputs, produce the
“correct” outputs for new inputs
• study in this course
• Unsupervised Learning: given only inputs as
training, find structure in the world: e.g. discover
clusters
• Other types, such as reinforcement learning are
not covered in this course

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Supervised Machine Learning


• Training samples (or examples) x1,x2,…, xn
• Each example xi is typically multi-dimensional
• xi1, xi2 ,…, xid are called features, xi is often called a
feature vector
• Example: x1 = {3,7, 35}, x2 = {5, 9, 47}, …
• how many and which features do we take?
• Know desired output for each example y1, y2,…yn
• This learning is supervised (“teacher” gives desired outputs)
• yi are often one-dimensional
• Example: y1 = 1 (“face”), y2 = 0 (“not a face”)

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Supervised Machine Learning


• Two types of supervised learning:
• Classification (we will only do classification in this
course):
• yi takes value in finite set, typically called a label
or a class
• Example: yi ∈{“sunny”, ”cloudy”, ”raining”}
• Regression
• yi continuous, typically called an output value
• Example: yi = temperature ∈[-60,60]

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Toy Application: fish sorting


classifier

fis
h sp
fis ec
h im ie s salmon
ag
e

sorting
chamber

sea bass

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Classifier design
• Notice salmon tends to be shorter than sea bass
• Use fish length as the discriminating feature
• Count number of bass and salmon of each length
2 4 8 10 12 14
bass 0 1 3 8 10 5
salmon 2 5 10 5 1 0

12
10
8
Count

salmon
6
sea bass
4
2
0
2 4 8 10 12 14
Length

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Single Feature (length) Classifier


• Find the best length L threshold
fish length < L fish length > L

classify as salmon classify as sea bass

• For example, at L = 5, misclassified:

• 1 sea bass 2 4 8 10 12 14
bass 0 1 3 8 10 5
• 16 salmon
salmon 2 5 10 5 1 0

• Classification error (total error) 17 = 34%


50

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Single Feature (length) Classifier


fish classified fish classified
as salmon as sea bass
12
10
8
Count

salmon
6
sea bass
4
2
0
2 4 8 10 12 14
Length

• After searching through all possible thresholds L, the


best L= 9, and still 20% of fish is misclassified

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Next Step
• Lesson learned:
• Length is a poor feature alone!
• What to do?
• Try another feature
• Salmon tends to be lighter
• Try average fish lightness

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Single Feature (lightness) Classifier


1 2 3 4 5
bass 0 1 2 10 12
salmon 6 10 6 1 0

14
12
10
Count

8 salmon
6 sea bass
4
2
0
1 2 3 4 5
Lightness

• Now fish are classified best at lightness


threshold of 3.5 with classification error of 8%
Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)
lOMoARcPSD|17897360

Can do better by feature combining


• Use both length and lightness features
• Feature vector [length,lightness]

ba decision
ss
lightness

boundary

decision regions

sa
lm
on

length

• Classification error 4%

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Even Better Decision Boundary

lightness

length

• Decision boundary (wiggly) with 0% classification error

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Test Classifier on New Data


• The goal is for classifier to perform well on new data
• Test “wiggly” classifier on new data: 25% error

lightness

length

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

What Went Wrong?


added 2 samples

• We always have only a limited amount of data, not all


possible data
• We should make sure the decision boundary does not
adapt too closely to the particulars of the data we have
at hand, but rather grasps the “big picture”

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

What Went Wrong: Overfitting

• Complicated boundaries overfit the data, they are too


tuned to the particular training data at hand
• Therefore complicated boundaries tend to not
generalize well to the new data
• We usually refer to the new data as “test” data
Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)
lOMoARcPSD|17897360

Overfitting: Extreme Example


• Say we have 2 classes: face and non-face images
• Memorize (i.e. store) all the “face” images
• For a new image, see if it is one of the stored faces
• if yes, output “face” as the classification result
• If no, output “non-face”
• also called “rote learning”
• problem: new “face” images are different from stored
“face” examples
• zero error on stored data, 50% error on test (new) data
• Rote learning is memorization without generalization

slide is modified from Y. LeCun


Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)
lOMoARcPSD|17897360

Generalization
training data test data

• The ability to produce correct outputs on previously unseen


examples is called generalization
• The big question of learning theory: how to get good generalization
with a limited number of examples
• Intuitive idea: favor simpler classifiers
• William of Occam (1284-1347): “entities are not to be multiplied without necessity”
• Simpler decision boundary may not fit ideally to the training data
but tends to generalize better to new data

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Underfitting

• We can also underfit data, i.e. use too simple decision


boundary
• chosen model is not expressive enough
• There is no way to fit a linear decision boundary so that
the training examples are well separated
• Training error is too high
• test error is, of course, also high

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Underfitting Overfitting

underfitting “just right” overfitting

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Sketch of Supervised Machine Learning


• Chose a learning machine f(x,w)
• w are tunable weights
• x is the input sample
• f(x,w) should output the correct class of sample x
• use labeled samples to tune weights w so that f(x,w)
give the correct label for sample x
• Which function f(x,w) do we choose?
• has to be expressive enough to model our problem
well, i.e. to avoid underfitting
• yet not to complicated to avoid overfitting

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Training and Testing


• There are 2 phases, training and testing
• Divide all labeled samples x1,x2,…xn into 2 sets,
training set and test set
• Training phase is for “teaching” our machine (finding
optimal weights w)
• Testing phase is for evaluating how well our machine
works on unseen examples

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Training Phase
• Find the weights w s.t. f(xi,w) = yi “as much as
possible” for training samples (xi, yi)
• “as much as possible” needs to be defined
• How do we find parameters w to ensure
f(xi,w) = yi for most training samples (xi,yi) ?
• This step is usually done by optimization, can be
quite time consuming

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Testing Phase
• The goal is to design machine which performs
well on unseen examples
• Evaluate the performance of the trained
machine f(x,w) on the test samples (unseen
labeled samples)
• Testing the machine on unseen labeled examples
lets us approximate how well it will perform in
practice
• If testing results are poor, may have to go back
to the training phase and redesign f(x,w)

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Generalization and Overfitting


• Generalization is the ability to produce correct
output on previously unseen examples
• In other words, low error on unseen examples
• Good generalization is the main goal of ML
• Low training error does not necessarily imply that
we will have low test error
• we have seen that it is easy to produce f(x,w) which is
perfect on training samples (rote “learning”)
• Overfitting
• when the machine performs well on training data but
poorly on test data

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Classification System Design Overview


• Collect and label data by hand
salmon sea bass salmon salmon sea bass sea bass

• Split data into training and test sets


• Preprocess by segmenting fish from background

• Extract possibly discriminating features


• length, lightness, width, number of fins,etc.
• Classifier design
• Choose model for classifier we look at these two
• Train classifier on training data steps in this course
• Test classifier on test data

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Basic Linear Algebra


• Basic Concepts in Linear Algebra
• vectors and matrices
• products and norms
• vector spaces and linear transformations
• Introduction to Matlab

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Why Linear Algebra?


• For each example (e.g. a fish image), we extract a set
of features (e.g. length, width, color)
• This set of features is represented as a feature vector
• [length, width, color]
• All collected examples will be represented as
collection of (feature) vectors

[l1, w1 , c1 ] example 1 l1 w1 c1
[l2 , w2 , c2 ] example 2 l2 w2 c2
[l3 , w3 , c3 ] example 3
l3 w3 c3
matrix

• Also, we will use linear models since they are simple


and computationally tractable

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

What is a Matrix?
• A matrix is a set of elements, organized into
rows and columns
rows

2 7 6 10 example 1
columns 1 4 4 9 example 2

6 4 9 feature 2 6 example 3

feature 3

feature 4
feature 1

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Basic Matrix Operations


• addition, subtraction, multiplication by a scalar

a b e f a+e b+ f
+ = add elements
c d g h c+g d +h

a b e f a−e b− f
− = subtract elements
c d g h c−g d −h

α ⋅ a α ⋅b
a b
α⋅ = multiply every entry
c d α ⋅c α ⋅d

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Matrix Transpose
T
• n by m matrix A and its m by n transpose A

x11 x12 x1m x11 x21 xn1


x21 x22 x2m T
x12 x22 xn2
A= A =

xn1 xn2 xnm x1m x2m xnm

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Vectors
• Vector: N x 1 matrix
x1
v=
x2
• dot product and magnitude defined on vectors only

x2 x2 x2
v a a+b a a-b

b b
x1 x1 x1
vector addition vector subtraction

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

More on Vectors
• n-dimensional row vector x = [x1 x2 xn ]

x1
T
x2
• Transpose of row vector is column vector x =

xn
• Vector product (or inner or dot product)

x, y = x ⋅ y = xT y = x1 y1 + x2 y2 + + xn yn = xi yi
i=1 n

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

More on Vectors
• Euclidian norm or length x = x, x = xi2
i =1 n

• If x =1 we say x is normalized or unit length


xT y
• angle q between vectors x and y : cos θ =
x y
• inner product captures direction relationship
cos θ = 0 cos θ = 1 cos θ = −1
y y y
x
x
T
x
x y=0 xT y = x y > 0 xT y = − x y < 0
x⊥y

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

More on Vectors

• Vectors x and y are orthonormal if they are


orthogonal and x = y =1

• Euclidian distance between vectors x and y

x x-y
x− y = (xi − yi ) 2

i =1 n
y

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Linear Dependence and Independence


• Vectors x1, x2,…, xn are linearly dependent if
there exist constants α1, α2,…, αn s.t.
• α1x1+ α2x2+…+αnxn = 0
• αi 0 for at least one I

• Vectors x1, x2,…, xn are linearly independent if


α1x1+ α2x2+…+αnxn = 0 α1 = α2=…= αn= 0

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Vector Spaces and Basis


• The set of all n-dimensional vectors is called a
vector space V
• A set of vectors {u1,u2,…, un } are called a basis
for vector space if any v in V can be written as
v = α1u1+ α2u2+…+αnun
• u1,u2,…, un are independent implies they form a
basis, and vice versa
• u1,u2,…, un give an orthonormal basis if
1. ui = 1 ∀i
2. ui ⊥ u j ∀i ≠ j

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Orthonormal Basis
• x, y,…, z form an orthonormal basis

x = [1 0 0] x⋅ y = 0
T

y = [0 1 0]
T x⋅z = 0
z = [0 0 1]
T y⋅z = 0

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Matrix Product
b11 b1m
a11 a12 a13 a1d b21 b2m
AB = b31 b3m = cij
an1 an2 an3 and
bd1 bdm cij = ai, bj
ai is row i of A
bj is column j of B

• # of columns of A = # of rows of B
• even if defined, in general AB BA

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

Matrices
• Rank of a matrix is the number of linearly
independent rows (or equivalently columns)
• A square matrix is non-singular if its rank equal
to the number of rows. If its rank is less than
number of rows it is singular.
1 0 0
0 1 0
• Identity matrix I=
0 0
AI=IA=A 1 2 9 5
0 0 1
T
2 7 4 8
• Matrix A is symmetric if A=A
9 4 3 6
5 8 6 4
Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)
lOMoARcPSD|17897360

Matrices
-1
• Inverse of a square matrix A is matrix A s.t.
-1
AA = I
• If A is singular or not square, inverse does not
exist
T
• Pseudo-inverse A is defined whenever A A is
not singular (it is square)
T -1 T
A = (A A) A
T -1 T
AA =(A A) AA=I

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

MATLAB

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)


lOMoARcPSD|17897360

• Starting matlab • Elementary functions


• xterm -fn 12X24 • help elfun
• matlab • Data types
• Basic Navigation • double
• quit • Char
• more • Programming in Matlab
• help general • .m files
• Scalars, variables, basic arithmetic • scripts
• Clear • function y=square(x)
• + - */ ^ • help lang
• help arith • Flow control
• Relational operators • if i== 1else end, if else if end
• ==,&,|,~,xor • for i=1:0.5:2 … end
• help relop • while i == 1 … end
• Lists, vectors, matrices • Return
• A=[2 3;4 5] • help lang
• A’ • Graphics
• Matrix and vector operations • help graphics
• find(A>3), colon operator • help graph3d
• * / ^ .* ./ .^ • File I/O
• eye(n),norm(A),det(A),eig(A) • load,save
• max,min,std • fopen, fclose, fprintf, fscanf
• help matfun

Downloaded by Marisnelvys Cabreja (930mcc@gmail.com)

You might also like