0% found this document useful (0 votes)

3 views122 pages

Lecture 14

The document discusses linear discriminant functions and the perceptron as an early classifier, explaining its learning algorithm and geometric view. It emphasizes the relationship between regression and classification, highlighting the goal of learning a function that captures the relationship between input and output. The document also covers linear regression, the criterion function for optimization, and the linear least squares method for finding optimal parameters.

Uploaded by

jayanths.242sp014

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views122 pages

Lecture 14

Uploaded by

jayanths.242sp014

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 122

Recap

• We have been considering linear discriminant

functions.

PR NPTEL course – p.1/122

Recap

• We have been considering linear discriminant

functions.
• Such a linear classifier is given by

Xd′

h(X) = 1 if wi φi (X) + w0 > 0

i=1
= 0 Otherwise
where φi are fixed functions.

PR NPTEL course – p.2/122

Recap

• We have been considering linear discriminant

functions.
• Such a linear classifier is given by

Xd′

h(X) = 1 if wi φi (X) + w0 > 0

i=1
= 0 Otherwise
where φi are fixed functions.
• We have been considering the case φi (X) = xi for
simplicity.
PR NPTEL course – p.3/122
Perceptron

• Perceptron is the earliest such classifier.

PR NPTEL course – p.4/122

Perceptron

• Perceptron is the earliest such classifier.

• Assuming augumented feature vector,
h(X) = sgn(W T X).

PR NPTEL course – p.5/122

Perceptron

• Perceptron is the earliest such classifier.

• Assuming augumented feature vector,
h(X) = sgn(W T X).
• ‘find weighted sum and threshold’

PR NPTEL course – p.6/122

Perceptron

• Perceptron is the earliest such classifier.

• Assuming augumented feature vector,
h(X) = sgn(W T X).
• ‘find weighted sum and threshold’

PR NPTEL course – p.7/122

Perceptron Learning Algorithm

• A simple iterative algorithm.

PR NPTEL course – p.8/122

Perceptron Learning Algorithm

• A simple iterative algorithm.

• Each iteration, we locally try to correct errors.

PR NPTEL course – p.9/122

Perceptron Learning Algorithm

• A simple iterative algorithm.

• Each iteration, we locally try to correct errors.

Let ∆W (k) = W (k + 1) − W (k). Then

∆W (k) =0 if W (k)T X(k) > 0 & y(k) = 1, or

W (k)T X(k) < 0 & y(k) = 0
= X(k) if W (k)T X(k) ≤ 0 & y(k) = 1
= − X(k) if W (k)T X(k) ≥ 0 & y(k) = 0

PR NPTEL course – p.10/122

Perceptron: Geometric view

The algorithm has a simple geometric view. Consider the

following data set.

PR NPTEL course – p.11/122

• Suppose W (k) misclassifies a pattern.

PR NPTEL course – p.12/122

• Now the correction made to W (k) can be seen as

PR NPTEL course – p.13/122

• We showed that: if the training set is linearly
separable, then the algorithm would find a separating
hyperplane in finitely many iterations.

PR NPTEL course – p.14/122

• We showed that: if the training set is linearly
separable, then the algorithm would find a separating
hyperplane in finitely many iterations.
• We also saw the ‘batch’ version of the algorithm. It is
shown to be a gradient descent on a reasonable cost
function.

PR NPTEL course – p.15/122

Perceptron

• A simple ‘device’: Weighted sum and threshold.

PR NPTEL course – p.16/122

Perceptron

• A simple ‘device’: Weighted sum and threshold.

• A simple learning machine. (A neuron model).

PR NPTEL course – p.17/122

Perceptron

• A simple ‘device’: Weighted sum and threshold.

• A simple learning machine. (A neuron model).

PR NPTEL course – p.18/122

• Perceptron is an interesting algorithm to learn linear
classifiers.

PR NPTEL course – p.19/122

• Perceptron is an interesting algorithm to learn linear
classifiers.
• Works only when data is linearly separable.

PR NPTEL course – p.20/122

• Perceptron is an interesting algorithm to learn linear
classifiers.
• Works only when data is linearly separable.
• In general, not possible to know beforehand whether
data is linearly separable.

PR NPTEL course – p.21/122

PR NPTEL course – p.22/122

Regression Problems

• Recall that the regression or function learning

problem is closely related to learning classifiers.

PR NPTEL course – p.23/122

Regression Problems

• Recall that the regression or function learning

problem is closely related to learning classifiers.
• The training set would be
{(Xi , yi ), i = 1, · · · , n} with Xi ∈ ℜd , yi ∈ ℜ, ∀i.

PR NPTEL course – p.24/122

Regression Problems

• Recall that the regression or function learning

problem is closely related to learning classifiers.
• The training set would be
{(Xi , yi ), i = 1, · · · , n} with Xi ∈ ℜd , yi ∈ ℜ, ∀i.
• The main difference is that the ‘targets’ or the ‘output’,
yi , is continuous valued in regression problem while it
can take only finitely many distinct values in a
classifier.

PR NPTEL course – p.25/122

• In a regression problem, the goal is to learn a
function, f : ℜd → ℜ, that captures the relationship
between X and y . We write ŷ = f (X).

PR NPTEL course – p.26/122

PR NPTEL course – p.27/122

• In a regression problem, the goal is to learn a
function, f : ℜd → ℜ, that captures the relationship
between X and y . We write ŷ = f (X).
• Note that any such function can also be viewed as a
classifier.
We can take h(X) = sgn(f (X)) as the classifier.
• We search over a suitably parameterized class of
functions to find the best one.

PR NPTEL course – p.28/122

PR NPTEL course – p.29/122

Linear Regression

• We will now consider learning a linear function

d
X
f (X) = wi xi + w0
j=1

where W = (w1 , · · · , wd )T ∈ ℜd and w0 ∈ ℜ are the

parameters.

PR NPTEL course – p.30/122

Linear Regression

• We will now consider learning a linear function

d
X
f (X) = wi xi + w0
j=1

where W = (w1 , · · · , wd )T ∈ ℜd and w0 ∈ ℜ are the

parameters.
• Thus a linear model can be expressed as
f (X) = W T X + w0 .

PR NPTEL course – p.31/122

Linear Regression

• We will now consider learning a linear function

d
X
f (X) = wi xi + w0
j=1

where W = (w1 , · · · , wd )T ∈ ℜd and w0 ∈ ℜ are the

parameters.
• Thus a linear model can be expressed as
f (X) = W T X + w0 .
• As earlier, by using an augumented vector X , we can
write this as f (X) = W T X .
PR NPTEL course – p.32/122
• Now, to learn ‘optimal’ W , we need a criterion
function.

PR NPTEL course – p.33/122

• Now, to learn ‘optimal’ W , we need a criterion
function.
• The criterion function assigns a figure of merit or cost
to each W ∈ ℜd+1 .

PR NPTEL course – p.34/122

• Now, to learn ‘optimal’ W , we need a criterion
function.
• The criterion function assigns a figure of merit or cost
to each W ∈ ℜd+1 .
• Then the optimal W would be one that optimizes the
criterion function.

PR NPTEL course – p.35/122

PR NPTEL course – p.36/122

Linear Least Squares Regression

• We want to find a W such that ŷ(X) = f (X) = W T X

is a good fit for the training data.

PR NPTEL course – p.37/122

Linear Least Squares Regression

• We want to find a W such that ŷ(X) = f (X) = W T X

is a good fit for the training data.
• Consider a function J : ℜd+1 → ℜ defined by
n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

PR NPTEL course – p.38/122

Linear Least Squares Regression

• We want to find a W such that ŷ(X) = f (X) = W T X

is a good fit for the training data.
• Consider a function J : ℜd+1 → ℜ defined by
n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

• We take the ‘optimal’ W to be the minimizer of J(·).

PR NPTEL course – p.39/122

Linear Least Squares Regression

• We want to find a W such that ŷ(X) = f (X) = W T X

is a good fit for the training data.
• Consider a function J : ℜd+1 → ℜ defined by
n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

• We take the ‘optimal’ W to be the minimizer of J(·).

• Known as linear least squares method.

PR NPTEL course – p.40/122

• We want to find W to minimize
n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

PR NPTEL course – p.41/122

• We want to find W to minimize
n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

• If we are learning a classifier we can have

yi ∈ {−1, +1}.

PR NPTEL course – p.42/122

• We want to find W to minimize
n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

• If we are learning a classifier we can have

yi ∈ {−1, +1}.
• Note that finally we would use sign of W T X as the
classifier output.

PR NPTEL course – p.43/122

• We want to find W to minimize
n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

• If we are learning a classifier we can have

yi ∈ {−1, +1}.
• Note that finally we would use sign of W T X as the
classifier output.
• Thus minimizing J is a good way to learn linear
discriminant functions also.
PR NPTEL course – p.44/122
• We want to find minimizer of
n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

PR NPTEL course – p.45/122

• We want to find minimizer of
n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

• This is a quadratic function and we can analytically

find the minimizer.

PR NPTEL course – p.46/122

• We want to find minimizer of
n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

• This is a quadratic function and we can analytically

find the minimizer.
• For this we rewrite J(W ) into a more convenient form.

PR NPTEL course – p.47/122

• Recall that we take all vectors to be column vectors.

PR NPTEL course – p.48/122

• Recall that we take all vectors to be column vectors.
• Hence each training sample Xi is a (d + 1) × 1 matrix.

PR NPTEL course – p.49/122

• Recall that we take all vectors to be column vectors.
• Hence each training sample Xi is a (d + 1) × 1 matrix.
• Let A be a matrix given by
T
A = [X1 · · · Xn ]

PR NPTEL course – p.50/122

• Recall that we take all vectors to be column vectors.
• Hence each training sample Xi is a (d + 1) × 1 matrix.
• Let A be a matrix given by
T
A = [X1 · · · Xn ]
• A is a n × (d + 1) matrix whose ith row is given by XiT .

PR NPTEL course – p.51/122

PR NPTEL course – p.52/122

• Let Y be a n × 1 vector whose ith element is yi .

PR NPTEL course – p.53/122

• Let Y be a n × 1 vector whose ith element is yi .
• Hence AW − Y would be a n × 1 vector whose ith
element is (XiT W − yi ).

PR NPTEL course – p.54/122

• Let Y be a n × 1 vector whose ith element is yi .
• Hence AW − Y would be a n × 1 vector whose ith
element is (XiT W − yi ).
• Hence we have
n
1 X ¡ T
¢2 1
J(W ) = X W − yi
i = (AW −Y )T (AW −Y )
2 i=1
2

PR NPTEL course – p.55/122

• To find minimizer of J(·) we need to equate its

gradient to zero

PR NPTEL course – p.56/122

• We have
∇ J(W ) = AT (AW − Y )

PR NPTEL course – p.57/122

• We have
∇ J(W ) = AT (AW − Y )
• Equating the gradient to zero, we get

(AT A)W = AT Y

PR NPTEL course – p.58/122

• We have
∇ J(W ) = AT (AW − Y )
• Equating the gradient to zero, we get

(AT A)W = AT Y
• The optimal W satisfies this system of linear
equations. (Called normal equations).

PR NPTEL course – p.59/122

• AT A is a (d + 1) × (d + 1) matrix.

PR NPTEL course – p.60/122

• AT A is a (d + 1) × (d + 1) matrix.
• AT A is invertible if A has linearly independent
columns. (This is because null space of A is same as
null space of AT A).

PR NPTEL course – p.61/122

• AT A is a (d + 1) × (d + 1) matrix.
• AT A is invertible if A has linearly independent
columns. (This is because null space of A is same as
null space of AT A).
• Rows of A are the training samples Xi .

PR NPTEL course – p.62/122

PR NPTEL course – p.63/122

• Hence columns of A are linearly independent if no
feature can be obtained as a linear combination of
other features.

PR NPTEL course – p.64/122

• Hence columns of A are linearly independent if no
feature can be obtained as a linear combination of
other features.
• If we assume features are linearly independent then
A would have linearly independent columns and
hence AT A would be invertible.

PR NPTEL course – p.65/122

PR NPTEL course – p.66/122

• The optimal W is a solution of (AT A)W = AT Y .

PR NPTEL course – p.67/122

• The optimal W is a solution of (AT A)W = AT Y .
• When AT A is invertible, we get the optimal W as

W ∗ = (AT A)−1 AT Y = A† Y
where A† = (AT A)−1 AT , is called the generalized
inverse of A.

PR NPTEL course – p.68/122

• The optimal W is a solution of (AT A)W = AT Y .
• When AT A is invertible, we get the optimal W as

W ∗ = (AT A)−1 AT Y = A† Y
where A† = (AT A)−1 AT , is called the generalized
inverse of A.
• The above W ∗ is the linear least squares solution for
our regression (or classification) problem.

PR NPTEL course – p.69/122

Geometry of Least Squares

• Our least squares method seeks to find a W to

minimize ||AW − Y ||2 .

PR NPTEL course – p.70/122

Geometry of Least Squares

• Our least squares method seeks to find a W to

minimize ||AW − Y ||2 .
• A is a n × (d + 1) matrix and normally n >> d.

PR NPTEL course – p.71/122

Geometry of Least Squares

• Our least squares method seeks to find a W to

minimize ||AW − Y ||2 .
• A is a n × (d + 1) matrix and normally n >> d.
• Consider the (over-determined) system of linear
equations AW = Y .

PR NPTEL course – p.72/122

Geometry of Least Squares

• Our least squares method seeks to find a W to

minimize ||AW − Y ||2 .
• A is a n × (d + 1) matrix and normally n >> d.
• Consider the (over-determined) system of linear
equations AW = Y .
• The system may or may not be consistent. But, we
seek to find W ∗ to minimize squared error.

PR NPTEL course – p.73/122

Geometry of Least Squares

• Our least squares method seeks to find a W to

PR NPTEL course – p.74/122

• The least squares method is trying to find a ‘best-fit’
W for the systems AW = Y .

PR NPTEL course – p.75/122

• The least squares method is trying to find a ‘best-fit’
W for the systems AW = Y .
• Let C0 , C1 , · · · , Cd be the columns of A.

PR NPTEL course – p.76/122

• The least squares method is trying to find a ‘best-fit’
W for the systems AW = Y .
• Let C0 , C1 , · · · , Cd be the columns of A.
• Then AW = w0 C0 + w1 C1 + · · · + wd Cd .

PR NPTEL course – p.77/122

• The least squares method is trying to find a ‘best-fit’
W for the systems AW = Y .
• Let C0 , C1 , · · · , Cd be the columns of A.
• Then AW = w0 C0 + w1 C1 + · · · + wd Cd .
• Thus, for any W , AW is a linear combination of
columns of A.

PR NPTEL course – p.78/122

PR NPTEL course – p.79/122

• Otherwise, we want the projection of Y onto the
column space of A.

PR NPTEL course – p.80/122

• Otherwise, we want the projection of Y onto the
column space of A.
• That is, we want to find a vector Z in the column
space of A that is closest to Y .

PR NPTEL course – p.81/122

PR NPTEL course – p.82/122

• Otherwise, we want the projection of Y onto the
column space of A.
• That is, we want to find a vector Z in the column
space of A that is closest to Y .
• Any vector in the column space of A can be written as
Z = AW for some W .
• Hence we want to find Z to minimize ||Z − Y ||2
subject to the constraint that Z = AW for some W .

PR NPTEL course – p.83/122

PR NPTEL course – p.84/122

• Let us take the original (and not augumented) data
vectors and write our model as
ŷ(X) = f (X) = W T X + w0 where now W ∈ ℜd .

PR NPTEL course – p.85/122

• Let us take the original (and not augumented) data
vectors and write our model as
ŷ(X) = f (X) = W T X + w0 where now W ∈ ℜd .
• Now we have
n
1 X ¡ T ¢2
J(W ) = W Xi + w0 − yi
2 i=1

PR NPTEL course – p.86/122

• Let us take the original (and not augumented) data
vectors and write our model as
ŷ(X) = f (X) = W T X + w0 where now W ∈ ℜd .
• Now we have
n
1 X ¡ T ¢2
J(W ) = W Xi + w0 − yi
2 i=1

• For any given W we can find best w0 by equating the

partial derivative to zero.

PR NPTEL course – p.87/122

We have
n
∂J X
= (W T Xi + w0 − yi )
∂w0 i=1

PR NPTEL course – p.88/122

We have
n
∂J X
= (W T Xi + w0 − yi )
∂w0 i=1

Equating the partial derivative to zero, we get

PR NPTEL course – p.89/122

We have
n
∂J X
= (W T Xi + w0 − yi )
∂w0 i=1

Equating the partial derivative to zero, we get

Pn T
i=1 (W Xi + w0 − yi ) = 0

PR NPTEL course – p.90/122

We have
n
∂J X
= (W T Xi + w0 − yi )
∂w0 i=1

Equating the partial derivative to zero, we get

Pn T
i=1 (W Xi + w0 − yi ) = 0
X n
T
Pn
⇒ nw0 + W i=1 Xi = yi
i=1

PR NPTEL course – p.91/122

This gives us
n
Ã n
!
1 X 1 X
w0 = yi − W T Xi
n i=1
n i=1

PR NPTEL course – p.92/122

This gives us
n
Ã n
!
1 X 1 X
w0 = yi − W T Xi
n i=1
n i=1

• Thus, w0 accounts for difference in the average of

W T X and average of y .

PR NPTEL course – p.93/122

This gives us
n
Ã n
!
1 X 1 X
w0 = yi − W T Xi
n i=1
n i=1

• Thus, w0 accounts for difference in the average of

W T X and average of y .
• So, w0 is often called the bias term.

PR NPTEL course – p.94/122

• We have taken our linear model to be
d
X
ŷ(X) = f (X) = wj xj
j=0

PR NPTEL course – p.95/122

• We have taken our linear model to be
d
X
ŷ(X) = f (X) = wj xj
j=0

• As mentioned earlier, we could instead choose any

fixed set of basis functions φi .

PR NPTEL course – p.96/122

• We have taken our linear model to be
d
X
ŷ(X) = f (X) = wj xj
j=0

• As mentioned earlier, we could instead choose any

fixed set of basis functions φi .
• Then the model would be
d′
X
ŷ(X) = f (X) = wj φj (X)
j=0
PR NPTEL course – p.97/122
• We can use the same criterion of minimizing sum of
squares of errors.
n
1 X ¡ T ¢2
J(W ) = W Φ(Xi ) − yi
2 i=1

where Φ(Xi ) = (φ0 (Xi ), · · · , φd′ (Xi ))T .

PR NPTEL course – p.98/122

• We can use the same criterion of minimizing sum of
squares of errors.
n
1 X ¡ T ¢2
J(W ) = W Φ(Xi ) − yi
2 i=1

where Φ(Xi ) = (φ0 (Xi ), · · · , φd′ (Xi ))T .

• We want the minimizer of J(·).

PR NPTEL course – p.99/122

• We can learn W using the same method as earlier.

PR NPTEL course – p.100/122

• We can learn W using the same method as earlier.
• Thus, we will again have

W ∗ = (AT A)−1 AT Y

PR NPTEL course – p.101/122

• We can learn W using the same method as earlier.
• Thus, we will again have

W ∗ = (AT A)−1 AT Y
• The only difference is that now the ith row of matrix A
would be
[φ0 (Xi ) φ1 (Xi ) · · · φd′ (Xi )]

PR NPTEL course – p.102/122

• As an example: Let d = 1. (Then Xi , yi ∈ ℜ).

PR NPTEL course – p.103/122

• As an example: Let d = 1. (Then Xi , yi ∈ ℜ).
• Take φj (X) = X j , j = 0, 1, · · · m.

PR NPTEL course – p.104/122

• As an example: Let d = 1. (Then Xi , yi ∈ ℜ).
• Take φj (X) = X j , j = 0, 1, · · · m.
• Now the model is
ŷ(X) = f (X) = w0 + w1 X + w2 X 2 + · · · + wm X m

PR NPTEL course – p.105/122

• As an example: Let d = 1. (Then Xi , yi ∈ ℜ).
• Take φj (X) = X j , j = 0, 1, · · · m.
• Now the model is
ŷ(X) = f (X) = w0 + w1 X + w2 X 2 + · · · + wm X m
• The model is: y is a mth degree polynomial in X .

PR NPTEL course – p.106/122

• As an example: Let d = 1. (Then Xi , yi ∈ ℜ).
• Take φj (X) = X j , j = 0, 1, · · · m.
• Now the model is
ŷ(X) = f (X) = w0 + w1 X + w2 X 2 + · · · + wm X m
• The model is: y is a mth degree polynomial in X .
• All such problems are tackled in a uniform fashion
using the least squares method we presented.

PR NPTEL course – p.107/122

LMS algorithm

• We are finding W ∗ that minimizes

n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

PR NPTEL course – p.108/122

LMS algorithm

• We are finding W ∗ that minimizes

n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

• We could have found the minimum through an

iterative scheme using gradient descent.

PR NPTEL course – p.109/122

LMS algorithm

• We are finding W ∗ that minimizes

n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

• We could have found the minimum through an

iterative scheme using gradient descent.
• The gradient of J is given by
n
X ¡ T
¢
∇J(W ) = Xi X W − yi
i
i=1

PR NPTEL course – p.110/122

• The iterative gradient descent scheme would be
n
X ¡ T
¢
W (k + 1) = W (k) − η Xi X W (k) − yi
i
i=1

PR NPTEL course – p.111/122

• The iterative gradient descent scheme would be
n
X ¡ T
¢
W (k + 1) = W (k) − η Xi X W (k) − yi
i
i=1

• In analogy with what we saw in Perceptron algorithm,

this can be viewed as a ‘batch’ version.

PR NPTEL course – p.112/122

• The iterative gradient descent scheme would be
n
X ¡ T
¢
W (k + 1) = W (k) − η Xi X W (k) − yi
i
i=1

• In analogy with what we saw in Perceptron algorithm,

this can be viewed as a ‘batch’ version.
• We use the current W to find the errors on all training
data and then do all the ‘corrections’ together.

PR NPTEL course – p.113/122

• The iterative gradient descent scheme would be
n
X ¡ T
¢
W (k + 1) = W (k) − η Xi X W (k) − yi
i
i=1

• In analogy with what we saw in Perceptron algorithm,

this can be viewed as a ‘batch’ version.
• We use the current W to find the errors on all training
data and then do all the ‘corrections’ together.
• We can instead have an incremental version of this
algorithm.
PR NPTEL course – p.114/122
• For the incremental version, at each iteration we pick
one of the training samples. Call this X(k).

PR NPTEL course – p.115/122

• For the incremental version, at each iteration we pick
one of the training samples. Call this X(k).
• The error on this sample would be
1 T 2
2
(X(k) W (k) − y(k)) .

PR NPTEL course – p.116/122

• For the incremental version, at each iteration we pick
one of the training samples. Call this X(k).
• The error on this sample would be
1 T 2
2
(X(k) W (k) − y(k)) .
• Using the gradient of only this term, we get the
incremental version as
W (k + 1) = W (k) − η X(k) (X(k)T W (k) − y(k))

PR NPTEL course – p.117/122

PR NPTEL course – p.118/122

• In the LMS algorithm, we iteratively update W as

W (k + 1) = W (k) − η X(k) (X(k)T W (k) − y(k))

PR NPTEL course – p.119/122

• In the LMS algorithm, we iteratively update W as

W (k + 1) = W (k) − η X(k) (X(k)T W (k) − y(k))

• Here (X(k), y(k)) is the training example picked at
iteration k and W (k) is the weight vector at iteration k .

PR NPTEL course – p.120/122

• In the LMS algorithm, we iteratively update W as

W (k + 1) = W (k) − η X(k) (X(k)T W (k) − y(k))

• Here (X(k), y(k)) is the training example picked at
iteration k and W (k) is the weight vector at iteration k .
• We do not need to have all training examples together
with us. We can learn W from a stream of examples
without needing to store them.

PR NPTEL course – p.121/122

• In the LMS algorithm, we iteratively update W as

W (k + 1) = W (k) − η X(k) (X(k)T W (k) − y(k))

PR NPTEL course – p.122/122

API Casing To Recommended Bit Size
100% (1)
API Casing To Recommended Bit Size
3 pages
GP RM E FEM PreStressedPlateBridge EC
No ratings yet
GP RM E FEM PreStressedPlateBridge EC
85 pages
Expose 6 PDF
0% (1)
Expose 6 PDF
2 pages
Lecture 34
No ratings yet
Lecture 34
135 pages
Lecture 33
No ratings yet
Lecture 33
131 pages
Exam Schedual
No ratings yet
Exam Schedual
119 pages
Lecture 36
No ratings yet
Lecture 36
133 pages
Lecture 5
No ratings yet
Lecture 5
127 pages
Lecture 4
No ratings yet
Lecture 4
128 pages
Lec 10
No ratings yet
Lec 10
196 pages
Math Behind Machine Learning
No ratings yet
Math Behind Machine Learning
9 pages
Lecture 6
No ratings yet
Lecture 6
123 pages
Lecture 32
No ratings yet
Lecture 32
99 pages
Introduction To Machine Learning - Unit 5 - Week 2
No ratings yet
Introduction To Machine Learning - Unit 5 - Week 2
4 pages
Lecture 3
No ratings yet
Lecture 3
81 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
ECE 449 Notes
No ratings yet
ECE 449 Notes
5 pages
Wk05 Machine Learning
No ratings yet
Wk05 Machine Learning
6 pages
Assignment 2: Introduction To Machine Learning Prof. B. Ravindran
100% (1)
Assignment 2: Introduction To Machine Learning Prof. B. Ravindran
3 pages
Statistical Methods-1
No ratings yet
Statistical Methods-1
63 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
ML Lecture06 2
No ratings yet
ML Lecture06 2
63 pages
ML 2023a Midsem Solution
No ratings yet
ML 2023a Midsem Solution
9 pages
Notes5 Regression
No ratings yet
Notes5 Regression
14 pages
Midterm2008f Sol
No ratings yet
Midterm2008f Sol
12 pages
Lecture 4-Revision - Part3 - PCA - Reg
No ratings yet
Lecture 4-Revision - Part3 - PCA - Reg
39 pages
BTMMeeting25Nov2020 StatisticalLearning
No ratings yet
BTMMeeting25Nov2020 StatisticalLearning
49 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
No ratings yet
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
26 pages
Lecture 2 Math
No ratings yet
Lecture 2 Math
34 pages
Book
No ratings yet
Book
130 pages
ML Basics Lecture2 Linear Classification
No ratings yet
ML Basics Lecture2 Linear Classification
34 pages
Ml2 Script v2
No ratings yet
Ml2 Script v2
123 pages
2IIG0 Cheat Sheet 1
No ratings yet
2IIG0 Cheat Sheet 1
2 pages
Lecture 2
No ratings yet
Lecture 2
8 pages
Cheat Sheet For Exam
No ratings yet
Cheat Sheet For Exam
2 pages
Lect5 Reg
No ratings yet
Lect5 Reg
16 pages
05-1 Supervised Learning
No ratings yet
05-1 Supervised Learning
65 pages
16 - The Key To The Most Powerful ML Models
No ratings yet
16 - The Key To The Most Powerful ML Models
25 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
NN Theory
No ratings yet
NN Theory
138 pages
Regression Using LS Handout
No ratings yet
Regression Using LS Handout
21 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
Linear - Regression - SGD
No ratings yet
Linear - Regression - SGD
71 pages
MLF Notes - Rishab Dec 24
No ratings yet
MLF Notes - Rishab Dec 24
6 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
2 Linear
No ratings yet
2 Linear
91 pages
Model Selection Techniques - An Overview: Jie Ding, Vahid Tarokh, and Yuhong Yang
No ratings yet
Model Selection Techniques - An Overview: Jie Ding, Vahid Tarokh, and Yuhong Yang
21 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
56 pages
A Universal Selection Method in Linear Regression Models: Eckhard Liebscher
No ratings yet
A Universal Selection Method in Linear Regression Models: Eckhard Liebscher
10 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
When Models Meet Data
No ratings yet
When Models Meet Data
25 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
CMU 2018s NinaBALCAN HW3
No ratings yet
CMU 2018s NinaBALCAN HW3
7 pages
Content PDF
No ratings yet
Content PDF
61 pages
Linear Regression
No ratings yet
Linear Regression
55 pages
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Math for Computer Applications
From Everand
Math for Computer Applications
The Editors of REA
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Features Description: Single Phase, Multifunction Energy Meter IC
No ratings yet
Features Description: Single Phase, Multifunction Energy Meter IC
30 pages
Ejemplo - Contrato de Obra by Alicia Cosio González - Issuu
No ratings yet
Ejemplo - Contrato de Obra by Alicia Cosio González - Issuu
1 page
Appendix C - Machine Language: Code Operand Description
No ratings yet
Appendix C - Machine Language: Code Operand Description
1 page
Advanced ATM Crime Prevention System by Using Wireless Communication
No ratings yet
Advanced ATM Crime Prevention System by Using Wireless Communication
6 pages
A List of All My Torrents
No ratings yet
A List of All My Torrents
3 pages
Dasar Mesin Elektrik G-M Saja
No ratings yet
Dasar Mesin Elektrik G-M Saja
45 pages
VGS House Model - Estimate
No ratings yet
VGS House Model - Estimate
1 page
Fake Snapchat Chat Generator
No ratings yet
Fake Snapchat Chat Generator
1 page
Alpine Cva-1005 Wiring Diagram
No ratings yet
Alpine Cva-1005 Wiring Diagram
2 pages
Exam Questions With Answers
No ratings yet
Exam Questions With Answers
11 pages
ZOTUP ZU MV - Overvoltage Surge Arrester For Medium Voltage Solutions
No ratings yet
ZOTUP ZU MV - Overvoltage Surge Arrester For Medium Voltage Solutions
3 pages
Bricks
No ratings yet
Bricks
34 pages
Lowongan Pekerjaan - Employee Referral Program (10022021)
No ratings yet
Lowongan Pekerjaan - Employee Referral Program (10022021)
5 pages
Corporate Training
No ratings yet
Corporate Training
11 pages
2018 M.SC 2nd Sem
No ratings yet
2018 M.SC 2nd Sem
12 pages
JavaScript Cheatsheet - CodeWithHarry
No ratings yet
JavaScript Cheatsheet - CodeWithHarry
13 pages
Full Paper Title in Title Case: Name Surname, Name Surname
No ratings yet
Full Paper Title in Title Case: Name Surname, Name Surname
4 pages
The Business of Intellectual Property A Literature Review of IP Management Research
No ratings yet
The Business of Intellectual Property A Literature Review of IP Management Research
20 pages
Caterpillar Model
100% (1)
Caterpillar Model
109 pages
Rans Simulation of Viscous Flow Around Hull of Multipurpose Amphibious Vehicle
No ratings yet
Rans Simulation of Viscous Flow Around Hull of Multipurpose Amphibious Vehicle
5 pages
QuickGuide 2018
No ratings yet
QuickGuide 2018
7 pages
LJ CG Unit 2
No ratings yet
LJ CG Unit 2
2 pages
Cat Printable Pack o
100% (1)
Cat Printable Pack o
81 pages
Methods2023 Syllabus
No ratings yet
Methods2023 Syllabus
7 pages
Pointers Reviewer For Second Periodical Exam
No ratings yet
Pointers Reviewer For Second Periodical Exam
2 pages
Argus 40 Optical Swing Lane Data Sheet
No ratings yet
Argus 40 Optical Swing Lane Data Sheet
4 pages
How Social Media Can Make A History by Clay Shirky - Reaction Paper John Darryl P. Ligan
No ratings yet
How Social Media Can Make A History by Clay Shirky - Reaction Paper John Darryl P. Ligan
2 pages