0% found this document useful (0 votes)

7 views107 pages

Class 2

The document discusses the gradient descent algorithm for training 2-layer linear neural networks, detailing the steps for weight initialization, gradient computation, and iterative updates until convergence. It emphasizes the importance of hyperparameter tuning, explaining the distinction between model parameters and hyperparameters, and the need for separate datasets for training, validation, and testing. Additionally, it introduces linear auto-regressive models for predicting future values based on historical data, illustrating the concept of using past values for future predictions.

Uploaded by

Madhav Kalyan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views107 pages

Class 2

Uploaded by

Madhav Kalyan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 107

CS/DS 541: Class 2

Jacob Whitehill

1
Gradient descent for
2-layer linear NNs

2
Gradient descent algorithm
• Set w to random values; call this initial choice w(0).
• Compute the gradient: rw f (w(0) )
• Update w by moving opposite the gradient, multiplied by a
learning rate ε. w(1) w (0)
✏rw f (w )(0)

• Repeat…
w(2) w(1) ✏rw f (w(1) )
(3) (2) (2)
w w ✏rw f (w )
…
(t) (t 1) (t 1)
w w ✏rw f (w )
• …until convergence:

3
Jacob Whitehill, WPI
Gradient descent
• For a 2-layer linear NN, the gradient of fMSE w.r.t. w is:
" n ⇣
#
1 X ⌘2
(i) >
rw fMSE (y, ŷ; w) = rw x w y (i)
2n i=1
Xn ⇣ ⌘2
1 (i) >
= rw x w y (i)
2n i=1
1 Xn ⇣ ⌘
(i) (i) >
= x x w y (i)
n i=1

4
Jacob Whitehill, WPI
Gradient descent
• By using matrices, we can nd a more compact notation
for the gradient.

• De ne the design/feature matrix X and label vector y:

<latexit sha1_base64="C+sk46FTvc+9oHn/nGwJmUg4Q8E=">AAACP3icbVBNS+RAEO24umrWXWd3j14ahwW9DMni12VB9OJxBGcUJnHodCozjZ1O6K4IIeSf7WX/gjevXjy4iFdv9sxE8Kug4fFevaquF+VSGPS8a2fu0/zC58WlZffLytdvq63vP/omKzSHHs9kps8iZkAKBT0UKOEs18DSSMJpdHE40U8vQRuRqRMscwhTNlIiEZyhpYatfhVECS1r+ocGEhIcBBGMhKqY1qysK1675Xm14W/WNAioG1zGGRoLp6x6ZkHFjcENtBiNMRy22l7HmxZ9D/wGtElT3WHrKogzXqSgkEtmzMD3cgztVBRcgp1bGMgZv2AjGFioWAomrKb31/SXZWKaZNo+hXTKvnRULDWmTCPbmTIcm7fahPxIGxSY7IWVUHmBoPhsUVJIihmdhEljoYGjLC1gXAv7V8rHTDOONnLXhuC/Pfk96P/u+Dud7eOt9v5BE8cSWSPrZIP4ZJfskyPSJT3CyV9yQ+7If+efc+vcOw+z1jmn8fwkr8p5fAINka5e</latexit>

2 3
(1)
y
6 .. 7
y=4 . 5
y (n)

5
Jacob Whitehill, WPI
fi
fi
Gradient descent
• By using matrices, we can nd a more compact notation
for the gradient.

• De ne the design/feature matrix X and label vector y: <latexit sha1_base64="C+sk46FTvc+9oHn/nGwJmUg4Q8E=">AAACP3icbVBNS+RAEO24umrWXWd3j14ahwW9DMni12VB9OJxBGcUJnHodCozjZ1O6K4IIeSf7WX/gjevXjy4iFdv9sxE8Kug4fFevaquF+VSGPS8a2fu0/zC58WlZffLytdvq63vP/omKzSHHs9kps8iZkAKBT0UKOEs18DSSMJpdHE40U8vQRuRqRMscwhTNlIiEZyhpYatfhVECS1r+ocGEhIcBBGMhKqY1qysK1675Xm14W/WNAioG1zGGRoLp6x6ZkHFjcENtBiNMRy22l7HmxZ9D/wGtElT3WHrKogzXqSgkEtmzMD3cgztVBRcgp1bGMgZv2AjGFioWAomrKb31/SXZWKaZNo+hXTKvnRULDWmTCPbmTIcm7fahPxIGxSY7IWVUHmBoPhsUVJIihmdhEljoYGjLC1gXAv7V8rHTDOONnLXhuC/Pfk96P/u+Dud7eOt9v5BE8cSWSPrZIP4ZJfskyPSJT3CyV9yQ+7If+efc+vcOw+z1jmn8fwkr8p5fAINka5e</latexit>

2 3
(1)
y
6 .. 7
y=4 . 5
y (n)
• Now we can rewrite the gradient:
Xn ⇣
<latexit sha1_base64="uxf6nLqyUlGsPYiCoKtE+y4UPYg=">AAACw3icbVFdixMxFM2MX2v92K4++hIsLlPQ0hFdBSkUZcEXYUW7W2jaIZNm2tBMZkju6JaQP+mb/hrTmVncDy8ETs45997k3rSUwsBw+DsIb92+c/fe3v3Og4ePHu93D56cmqLSjE9YIQs9TanhUig+AQGST0vNaZ5KfpZuPu30sx9cG1Go77At+TynKyUywSh4Kun+IYqmkiaWpBn+6XCWEODnoHP75duxi2p6617iGpA1BX/7gFt3Hx+OOiTTlNnYWeWIqfLEilHsFqrxnLuFjUTfYSJ5BhG2V2m3IFCU+KIefoW3jUC0WK2hjwnpHI7w5R6Nd+pwdIGaGv9KtG/uJ93ecDCsA98EcQt6qI2TpPuLLAtW5VwBk9SYWTwsYW6pBsEkdx1SGV5StqErPvNQ0Zybua134PALzyxxVmh/FOCavZxhaW7MNk+9M6ewNte1Hfk/bVZB9n5uhSor4Io1jbJKYijwbqF4KTRnILceUKaFfytma+rnBX7tHT+E+PqXb4LT14P4aPD265ve+GM7jj30DD1HEYrROzRGn9EJmiAWjIMsKIIyPA43oQ6hsYZBm/MUXYnQ/QVRZ9eW</latexit>

1 ⌘
(i) (i) > (i)
rw fMSE (y, ŷ; w) = x x w y
n i=1
1 >
= X(X w y)
n
6
Jacob Whitehill, WPI
fi
fi
Exercise

7
Gradient descent
• For the 2-layer NN below, let m=2 and w(0)=[1 0]T.

• Compute the updated weight vector w(1) after one

iteration of gradient descent using (1/2) MSE loss, a single
training example (x,y)=([2, 3]T, 4), and learning rate =0.1.
1
<latexit sha1_base64="S6748+Ik/XwWHnXkQ1f+b3SYPM0=">AAACRHicbZDLSgMxFIYz9V5vVZdugkXQhWVGvG2EoghuhIpWC506ZNKMBjOZITmjlmEezo0P4M4ncONCEbdi2o6g1QOBn+8/5yT5/VhwDbb9ZBWGhkdGx8YnipNT0zOzpbn5Mx0lirI6jUSkGj7RTHDJ6sBBsEasGAl9wc796/2uf37DlOaRPIVOzFohuZQ84JSAQV6p6UriC+Klrh/g2wwHngvsDlSYHp0cZCs5XsW72A0UoamTpTLr0UbuNrILF6IYf29Y66tOtuqVynbF7hX+K5xclFFeNa/06LYjmoRMAhVE66Zjx9BKiQJOBcuKbqJZTOg1uWRNIyUJmW6lvRAyvGxIGweRMkcC7tGfEykJte6EvukMCVzpQa8L//OaCQQ7rZTLOAEmaf+iIBEYItxNFLe5YhRExwhCFTdvxfSKmLDA5F40ITiDX/4rztYrzlZl83ijXN3L4xhHi2gJrSAHbaMqOkQ1VEcU3aNn9IrerAfrxXq3PvqtBSufWUC/yvr8Aj7ksPQ=</latexit>

• >
Recall: rw fMSE (w) = X(X w
n
y)

x1
w1
x2 w2
ŷ
… wm

Input layer Output layer

8
𝞊
Solution
1 >
rw fMSE (w) = X(X w y)
n
(1) (0)
w w ✏rw fMSE (w)
   !
>
1 2 2 1 ⇥ ⇤
= 0.1 4
0 3 3 0

1 + 0.1 ⇤ 2 ⇤ 2
=
0 + 0.1 ⇤ 3 ⇤ 2

1.4
=
0.6

9
Exercise
• Draw on paper a function (with one local minimum) such
that the magnitude of the gradient is NOT an indicator of
how far to move w so as to reach the local minimum.

10
Jacob Whitehill, WPI
Exercise
• Draw on paper a function such that this property is false.

11
Jacob Whitehill, WPI
Hyperparameter
tuning
Hyperparameter tuning

• The values we optimize when training a machine learning

model — e.g., w and b for linear regression — are the
parameters of the model.

• There are also values related to the training process itself

~
— e.g., learning rate ε, batch size n, regularization
strength ɑ — which are the hyperparameters of training.
Hyperparameter tuning
• Both the parameters and hyperparameters can have a
huge impact on model performance on test data.

• Ideally, we would hope that the accuracy of the system

varies smoothly with each hyper parameter value, e.g.:

Accuracy

Hyperparameter h
Hyperparameter tuning

• However, in the real world, the hyperparameter landscape

can be quite erratic, e.g.:

Accuracy

Hyperparameter h
Hyperparameter tuning
• If you choose hyperparameters on the test set, you are
likely deceiving yourself about how good your model is.

• This is a subtle but very dangerous form of ML

cheating.

Accuracy

Hyperparameter h
Hyperparameter tuning
• Instead, you should use a separate dataset that is not
part of the test set to choose hyperparameters.

• The most common approach is to use training, validation,

& testing sets:
• Training (typically 70-80%): optimization of parameters
• Validation (typically 5-10%): tuning of hyperparameters
• Testing (typically 5-10%): evaluation of the nal model

• For comparison with other researchers’ methods, this

partition should be xed.
fi
fi
Training/validation/testing
sets
• Hyperparameter tuning works as follows:
1.Choose a set of hyperparameter con gurations.
2.For each con guration h:
• Train the parameters on the training set using h.
• Evaluate the model on the validation set.
• If performance is better than what we got with the
best h so far (h*), then save h as h*.
3.Train a model with h*, and evaluate its accuracy A on
the testing set. (You can train either on training data, or
on training+validation data).
fi
fi
Linear auto-regressive
(AR) models

19
Linear auto-regressive (AR)
models

• In some application areas, we have a time series of values

x1, x2, …, xt, but no “labels” y.

• Task: Given the known values of x1, x2, …, xt-1, we want to

predict the value of xt.

• A classic example is stock market price prediction.

20
Linear auto-regressive (AR)
models
• In one classic prediction model, we use a xed length of
history (p) to predict the next value xt:

• ^
xt = w1 xt-1 + w2 xt-2 + … + wp xt-p

• We can model this prediction using the same 2-layer

neural network as before:

xt-1 w1

… ^
xt

wp
xt-p

21
fi
Auto-regression
• The essence of auto-regression is that we are using the past
to predict the next future event.

• We can apply this recursively to predict in nitely into the

future.

• Example for p=2, assuming we already know x1, x2:

• x^3 = w1 x2 + w2 x1
• x4 = w1 x3 + w2 x2
• x5 = w1 x4 + w2 x3
• …

22
fi
Auto-regression
• The essence of auto-regression is that we are using the past
to predict the next future event.

• We can apply this recursively to predict in nitely into the

future.

• Example for p=2, assuming we already know x1, x2:

• x^3 = w1 x2 + w2 x1
• x^4 = w1 x3 + w2 x2
• x5 = w1 x4 + w2 x3
• …

23
fi
Auto-regression
• The essence of auto-regression is that we are using the past
to predict the next future event.

• We can apply this recursively to predict in nitely into the

future.

• Example for p=2, assuming we already know x1, x2:

• x^3 = w1 x2 + w2 x1
• x^4 = w1 x3 + w2 x2
• ^
x5 = w1 x4 + w2 x3
• …

24
fi
Example

• ^
Model: xt = w1 xt-1 + w2 xt-2

• For w1=0.4, w2=-0.5, x1=0, and x2=2, what are the

predictions for x3, x4, and x5?
• x^3 = (0.4)2 + (-0.5)0 = 0.8
• x^4 = (0.4)0.8 + (-0.5)2 = -0.68
• x^5 = (0.4)(-.68) + (-0.5)(0.8) = -0.672

25
Multivariate auto-regression

• The value xt of each time-step can also be a vector, in

which case we multiply the values of previous timesteps
with matrices:

• ^
x t = W(1) xt-1 + … + W(p) xt-p

26
Multivariate auto-regression
• Suppose each observation xt has 2 components (xta, xtb),
and that p=2.

• Here is the corresponding neural network:

xt-1a

xt-1b xta

xt-2a xtb

xt-2b
27
Exercise
• Recall: x^t = W(1) xt-1 + … + W(p) xt-p

• To which matrix (W(1), W(2), or neither) do the rst 4 edges

correspond?

xt-1a

xt-1b xta

xt-2a xtb

xt-2b
28
fi
Exercise
• Recall: x^t = W(1) xt-1 + … + W(p) xt-p

• To which matrix (W(1), W(2), or neither) do the rst 4 edges

correspond? W(1).

xt-1a

xt-1b xta

xt-2a xtb

xt-2b
29
fi
Multivariate auto-regression
• We can alternatively represent this network with just a
single matrix of weights W if we “stack” the inputs:

• ^
xt = W [xt-1T ; … ; xt-pT]T

xt-1a
W

xt-1b xta

xt-2a xtb

xt-2b
30
Auto-regression in deep
learning

• Auto-regression is used frequently in deep learning,

especially for machine translation and text generation
(e.g., ChatGPT).

31
Stochastic gradient
descent

32
Gradient descent
• With gradient descent, we only update the weights after
scanning the entire training set.

• This is slow.

• If the training set contains 20K examples, then the

gradient is an average over 20K images.

• How much would the gradient really change if we just

used, say, 10K images? 5K images? 128 images?
1 >
rw fMSE (y, ŷ; w) = X(X w y)
n
Average over entire training set.
Jacob Whitehill, WPI
Stochastic gradient descent

• This is the idea behind stochastic gradient descent (SGD):

• Randomly sample a small ( n) mini-batch (or

sometimes just batch) of training examples.

• Estimate the gradient on just the mini-batch.

• Update weights based on mini-batch gradient estimate.

• Repeat.

Jacob Whitehill, WPI

Stochastic gradient descent
• In practice, SGD is usually conducted over multiple epochs.
• An epoch is a single pass through the entire training set.

• Procedure:
1. Let ñ ⌧ n equal the size of the mini-batch.
2. Randomize the order of the examples in the training set.
3. For e = 0 to numEpochs:
I. For i = 0 to (one epoch):
A. Select a mini-batch containing the next examples.
B. Compute the gradient on this mini-batch:
C. Update the weights based on the current mini-batch
gradient.

Jacob Whitehill, WPI

Stochastic gradient descent
• In practice, SGD is usually conducted over multiple epochs.
• An epoch is a single pass through the entire training set.

Jacob Whitehill, WPI

Stochastic gradient descent
• In practice, SGD is usually conducted over multiple epochs.
• An epoch is a single pass through the entire training set.

Jacob Whitehill, WPI

Stochastic gradient descent
• In practice, SGD is usually conducted over multiple epochs.
• An epoch is a single pass through the entire training set.

• Procedure:
1. Let ñ ⌧ n equal the size of the mini-batch.
2. Randomize the order of the examples in the training set.
3. For e = 0 to numEpochs:
I. For i = 0 to (dn/ñe 1) (one epoch):

A. Select a mini-batch containing the next examples.

B. Compute the gradient on this mini-batch:
C. Update the weights based on the current mini-batch
gradient.

Jacob Whitehill, WPI

Stochastic gradient descent
• In practice, SGD is usually conducted over multiple epochs.
• An epoch is a single pass through the entire training set.

• Procedure:
1. Let ñ ⌧ n equal the size of the mini-batch.
2. Randomize the order of the examples in the training set.
3. For e = 0 to numEpochs:
I. For i = 0 to (dn/ñe 1) (one epoch):

A. Select a mini-batch J containing the next ñ examples.

B. Compute the gradient on this mini-batch:
C. Update the weights based on the current mini-batch
gradient.

Jacob Whitehill, WPI

Stochastic gradient descent
• In practice, SGD is usually conducted over multiple epochs.
• An epoch is a single pass through the entire training set.

• Procedure:
1. Let ñ ⌧ n equal the size of the mini-batch.
2. Randomize the order of the examples in the training set.
3. For e = 0 to numEpochs:
I. For i = 0 to (dn/ñe 1) (one epoch):

A. Select a mini-batch J containing the next ñ examples.

1X
B. Compute the gradient on this mini-batch: ñ rW f (y(i) , ŷ(i) ; W)
i2J
C. Update the weights based on the current mini-batch
gradient.

Jacob Whitehill, WPI

Stochastic gradient descent
• In practice, SGD is usually conducted over multiple epochs.
• An epoch is a single pass through the entire training set.

• Procedure:
1. Let ñ ⌧ n equal the size of the mini-batch.
2. Randomize the order of the examples in the training set.
3. For e = 0 to numEpochs:
I. For i = 0 to (dn/ñe 1) (one epoch):

A. Select a mini-batch J containing the next ñ examples.

1X
B. Compute the gradient on this mini-batch: ñ rW f (y(i) , ŷ(i) ; W)
i2J
C. Update the weights based on the current mini-batch
gradient.

Jacob Whitehill, WPI

SGD versus GD: example
• Suppose our training set contains n=8 examples.

• Here is how regular gradient descent would proceed:

• Initialize weights w(0) to random values.
Training
examples
• For each round: 1

• Compute gradient on all n examples.

2
3
• Update weights: w(t+1) ⟵ w(t) - ϵ wf 4
5
6
7
8

Jacob Whitehill, WPI

𝛁
SGD versus GD: example
• Suppose our training set contains n=8 examples.

• Here is how regular gradient descent would proceed:

Training
• Initialize weights w(0) to random values. examples
• For each round: 1

• Compute gradient on all n examples.

2
3
• Update weights: w(t+1) ⟵ w(t) - ϵ wf 4
5
6
7
8

Jacob Whitehill, WPI

𝛁
SGD versus GD: example
• Suppose our training set contains n=8 examples.

• Here is how regular gradient descent would proceed:

Training
• Initialize weights w(0) to random values. examples
• For each round: 1

• Compute gradient on all n examples.

2
3
• Update weights: w(t+1) ⟵ w(t) - ϵ wf 4
5
6
7
8

Jacob Whitehill, WPI

𝛁
SGD versus GD: example
• Suppose our training set contains n=8 examples.

• Here is how regular gradient descent would proceed:

Training
• Initialize weights w(0) to random values. examples
• For each round: 1

• Compute gradient on all n examples.

2
3
• Update weights: w(t+1) ⟵ w(t) - ϵ wf 4
5
6
7
8

Jacob Whitehill, WPI

𝛁
SGD versus GD: example
• Suppose our training set contains n=8 examples.

• Here is how regular gradient descent would proceed:

Training
• Initialize weights w(0) to random values. examples
• For each round: 1

• Compute gradient on all n examples.

2
3
• Update weights: w(t+1) ⟵ w(t) - ϵ wf 4
5
6
7
8

Jacob Whitehill, WPI

𝛁
SGD versus GD: example
• Suppose our training set contains n=8 examples with ñ = 2.

• Here is how stochastic gradient descent would proceed:

• Initialize weights w(0) to random values. Training

examples
• Randomize the order of the training data. 1
2
• For each epoch:
3

• For each round: 4

5
• Compute gradient on next examples. 6

• Update weights: w(t+1) ⟵ w(t) - ϵ wf 7

Jacob Whitehill, WPI

𝛁
SGD versus GD: example
• Suppose our training set contains n=8 examples with ñ = 2.

• Here is how stochastic gradient descent would proceed:

• Initialize weights w(0) to random values. Training

examples
• Randomize the order of the training data. 4
1
• For each epoch:
3

• For each round: 5

7
• Compute gradient on next examples. 6

• Update weights: w(t+1) ⟵ w(t) - ϵ wf 8

Jacob Whitehill, WPI

𝛁
SGD versus GD: example
• Suppose our training set contains n=8 examples with ñ = 2.

• Here is how stochastic gradient descent would proceed:

• Initialize weights w(0) to random values. Training

examples
• Randomize the order of the training data. 4
1
• For each epoch (e=1, …, E): e=1
3

• For each round (r=1, …, dn/ñe ): 5

7
• Compute gradient on next ñ examples.
6

• Update weights: w(t+1) ⟵ w(t) - ϵ wf 8

Jacob Whitehill, WPI

𝛁
SGD versus GD: example
• Suppose our training set contains n=8 examples with ñ = 2.

• Here is how stochastic gradient descent would proceed:

• Initialize weights w(0) to random values. Training

examples
• Randomize the order of the training data. 4
1
• For each epoch (e=1, …, E): e=1
3

• For each round (r=1, …, dn/ñe ): 5

7
• Compute gradient on next ñ examples.
6
~
• Update weights: w(t+1) ⟵ w(t) - ϵ wf 8
2

Jacob Whitehill, WPI

𝛁
SGD versus GD: example
• Suppose our training set contains n=8 examples with ñ = 2.

• Here is how stochastic gradient descent would proceed:

• Initialize weights w(0) to random values. Training

examples
• Randomize the order of the training data. 4
1
• For each epoch (e=1, …, E): e=1
3

• For each round (r=1, …, dn/ñe ): 5

7
• Compute gradient on next ñ examples.
6
~
• Update weights: w(t+1) ⟵ w(t) - ϵ wf 8
2

Jacob Whitehill, WPI

𝛁
SGD versus GD: example
• Suppose our training set contains n=8 examples with ñ = 2.

• Here is how stochastic gradient descent would proceed:

• Initialize weights w(0) to random values. Training

examples
• Randomize the order of the training data. 4
1
• For each epoch (e=1, …, E): e=1
3

• For each round (r=1, …, dn/ñe ): 5

7
• Compute gradient on next ñ examples.
6
~
• Update weights: w(t+1) ⟵ w(t) - ϵ wf 8
2

Jacob Whitehill, WPI

𝛁
SGD versus GD: example
• Suppose our training set contains n=8 examples with ñ = 2.

• Here is how stochastic gradient descent would proceed:

• Initialize weights w(0) to random values. Training

examples
• Randomize the order of the training data. 4
1
• For each epoch (e=1, …, E): e=1
3

• For each round (r=1, …, dn/ñe ): 5

7
• Compute gradient on next ñ examples.
6
~
• Update weights: w(t+1) ⟵ w(t) - ϵ wf 8
2

Jacob Whitehill, WPI

𝛁
SGD versus GD: example
• Suppose our training set contains n=8 examples with ñ = 2.

• Here is how stochastic gradient descent would proceed:

• Initialize weights w(0) to random values. Training

examples
• Randomize the order of the training data. 4
1
• For each epoch (e=1, …, E): e=1
3

• For each round (r=1, …, dn/ñe ): 5

7
• Compute gradient on next ñ examples.
6
~
• Update weights: w(t+1) ⟵ w(t) - ϵ wf 8
2

Jacob Whitehill, WPI

𝛁
SGD versus GD: example
• Suppose our training set contains n=8 examples with ñ = 2.

• Here is how stochastic gradient descent would proceed:

• Initialize weights w(0) to random values. Training

examples
• Randomize the order of the training data. 4
1
• For each epoch (e=1, …, E): e=1
3

• For each round (r=1, …, dn/ñe ): 5

7
• Compute gradient on next ñ examples.
6
~
• Update weights: w(t+1) ⟵ w(t) - ϵ wf 8
2

Jacob Whitehill, WPI

𝛁
SGD versus GD: example
• Suppose our training set contains n=8 examples with ñ = 2.

• Here is how stochastic gradient descent would proceed:

• Initialize weights w(0) to random values. Training

examples
• Randomize the order of the training data. 4
1
• For each epoch (e=1, …, E): e=1
3

• For each round (r=1, …, dn/ñe ): 5

7
• Compute gradient on next ñ examples.
6
~
• Update weights: w(t+1) ⟵ w(t) - ϵ wf 8
2

Jacob Whitehill, WPI

𝛁
SGD versus GD: example
• Suppose our training set contains n=8 examples with ñ = 2.

• Here is how stochastic gradient descent would proceed:

• Initialize weights w(0) to random values. Training

examples
• Randomize the order of the training data. 4
1
• For each epoch (e=1, …, E): e=2
3

• For each round (r=1, …, dn/ñe ): 5

7
• Compute gradient on next ñ examples.
6
~
• Update weights: w(t+1) ⟵ w(t) - ϵ wf 8
2

Jacob Whitehill, WPI

𝛁
SGD versus GD: example
• Suppose our training set contains n=8 examples with ñ = 2.

• Here is how stochastic gradient descent would proceed:

• Initialize weights w(0) to random values. Training

examples
• Randomize the order of the training data. 4
1
• For each epoch (e=1, …, E): e=2
3

• For each round (r=1, …, dn/ñe ): 5

7
• Compute gradient on next ñ examples.
6
~
• Update weights: w(t+1) ⟵ w(t) - ϵ wf 8
2
…
Jacob Whitehill, WPI
𝛁
Stochastic gradient descent

• Despite “noise” (statistical inaccuracy) in the mini-batch

gradient estimates, we will still converge to local minimum.

• The noise can even sometimes help us to get out of worse

local minima and into better ones.

• Training can be much faster than regular gradient descent

because we adjust the weights many times per epoch.

Jacob Whitehill, WPI

SGD: learning rates

• With SGD, our learning rate needs to be annealed

(reduced slowly over time) to guarantee convergence.

• Otherwise we might just oscillate forever in weight space.

• Necessary conditions:
T
X
lim |✏t |2 < 1
T !1
t=1
Not too big: sum of squared
learning rates converges.
Jacob Whitehill, WPI
SGD: learning rates

• With SGD, our learning rate needs to be annealed

(reduced slowly over time) to guarantee convergence.

• Otherwise we might just oscillate forever in weight space.

• Necessary conditions:
T
X T
X
lim |✏t |2 < 1 lim |✏t | = 1
T !1 T !1
t=1 t=1
Not too small: sum of absolute
learning rates grows to in nity.
Jacob Whitehill, WPI
fi
SGD: learning rates
• One common learning rate “schedule” is to multiply by
c 2 (0, 1) every k rounds.

• This is called exponential decay.

• Another possibility (which avoids the issue) is to set the

number of epochs T to a nite number.

• SGD may not fully converge, but the machine might still
perform well.

• There are many other strategies.

Jacob Whitehill, WPI

fi
Optimization of ML
models
Optimization of ML models
• With linear regression, the cost function fMSE has a single
local minimum w.r.t. the weights w:

• As long as our learning rate is small enough, we will

eventually nd the optimal w.
fi
Convex ML models
• Linear regression has a loss function that is convex.

• With a convex function f, every local minimum is also a

global minimum.

convex non-convex

• Convex functions are ideal for conducting gradient

descent.

https://plus.maths.org/content/convexity
Jacob Whitehill, WPI
Convexity in 1-d
• How can we tell if a 1-d function f is convex?

• What property of the slope of f ensures there is only one

local minimum?

• From left to right, the slope of f never decreases.

==> the derivative of the slope is always non-negative.
==> the second derivative of f is always non-negative.

Jacob Whitehill, WPI

Convexity in 1-d
• How can we tell if a 1-d function f is convex?

• What property of the slope of f ensures there is only one

local minimum?

• From left to right, the slope of f never decreases.

==> the derivative of the slope is always non-negative.
==> the second derivative of f is always non-negative.

Jacob Whitehill, WPI

Convexity in higher
dimensions
• For higher-dimensional f, convexity is determined by the
the Hessian of f.
2 2 2 3
@ f @ f
@x1 @x1 ... @x1 @xm
6 .. .. 7
H[f ] = 6
4 ... . .
7
5
@2f @2f
@xm @x1 ... @xm @xm

• For f : R m
! R , f is convex if the Hessian matrix is
positive semi-de nite for every input x.

Jacob Whitehill, WPI

fi
Positive semi-de nite
• Positive semi-de nite is the matrix analog of being “non-
negative”.

• A real symmetric matrix A is positive semi-de nite (PSD) if

(equivalent conditions):
• All its eigenvalues are ≥0.
• If A happens to be diagonal, then its eigenvalues are
the diagonal elements.
• For every vector x: xTAx ≥0
• Therefore: If there exists any vector x such that
xTAx < 0, then A is not PSD.

Jacob Whitehill, WPI

fi
fi
fi
Positive semi-de nite
• Positive semi-de nite is the matrix analog of being “non-
negative”.

• A real symmetric matrix A is positive semi-de nite (PSD) if

(equivalent conditions):
• All its eigenvalues are ≥0.
• In particular, if A is diagonal, then A is PSD if its
eigenvalues are the diagonal elements.
• For every vector x: xTAx ≥0
• Therefore: If there exists any vector x such that
xTAx < 0, then A is not PSD.

Jacob Whitehill, WPI

fi
fi
fi
Positive semi-de nite
• Positive semi-de nite is the matrix analog of being “non-
negative”.

• A real symmetric matrix A is positive semi-de nite (PSD) if

(equivalent conditions):
• All its eigenvalues are ≥0.
• In particular, if A is diagonal, then A is PSD if its
eigenvalues are the diagonal elements.
• For every vector v: vTAv ≥0
• Therefore: If there exists any vector v such that
vTAv < 0, then A is not PSD.

Jacob Whitehill, WPI

fi
fi
fi
Example
• Suppose f(x, y) = 3x2 + 2y2 - 2.

• Then the rst derivatives are: @f

@x = 6x @f
@y = 4y
• The Hessian matrix is therefore:
" 2 # 
2
@ f @ f
@x@x @x@y 6 0
H= @2f @2f
=
0 4
@y@x @y@y

• Notice that H for this f does not depend on (x,y).

• Also, H is a diagonal matrix (with 6 and 4 on the diagonal).

Hence, the eigenvalues are just 6 and 4. Since they are both
non-negative, then f is convex.

Jacob Whitehill, WPI

fi
Example
• Graph of f(x, y) = 3x2 + 2y2 - 2:

Jacob Whitehill, WPI

Example
• Suppose f(x,y) = xy + x2 - y2.

• @f @f
<latexit sha1_base64="M3O+cQSypVYYMYLIsln1zHdeQOo=">AAACPHicfVBLSwMxGMz6rPVV9ejlwyIIYtktvi5C0YvHivYB3VKyabYNzT5MstJl6Q/z4o/w5smLB0W8ejbbFtRWHAgMM/Ml+cYJOZPKNJ+Mmdm5+YXFzFJ2eWV1bT23sVmVQSQIrZCAB6LuYEk582lFMcVpPRQUew6nNad3kfq1OyokC/wbFYe06eGOz1xGsNJSK3dtuwKTxA6xUAxzcAffvD+AM4hhH4p9sG8j3IZ/wnEa7sMBFONWLm8WzCFgmlhjkkdjlFu5R7sdkMijviIcS9mwzFA1k/Rmwukga0eShpj0cIc2NPWxR2UzGS4/gF2ttMENhD6+gqH6cyLBnpSx5+ikh1VXTnqp+JfXiJR72kyYH0aK+mT0kBtxUAGkTUKbCUoUjzXBRDD9VyBdrAtSuu+sLsGaXHmaVIsF67hwdHWYL52P68igbbSD9pCFTlAJXaIyqiCC7tEzekVvxoPxYrwbH6PojDGe2UK/YHx+AVhfrYM=</latexit>

Then the rst derivatives are: = y + 2x =x 2y

@x @y

• The Hessian matrix is therefore:

<latexit sha1_base64="eCpcSbG5aEWQ1ERb7hYLvkMPJ9w=">AAACL3icbVDLSgMxFM34dnxVXboJFsWNZab42giiIF1WsCp0hpJJ77TBTGZI7ghl6B+58VfciCji1r8wrbPwdSDkcM69N7knyqQw6HnPzsTk1PTM7Ny8u7C4tLxSWV27MmmuObR4KlN9EzEDUihooUAJN5kGlkQSrqPbs5F/fQfaiFRd4iCDMGE9JWLBGVqpUzkvgiimjSE9poGEGNvUDSLoCVUwrdlgWHA+dOt0m/pB4Pr23q27AahuabuBFr0+hp1K1at5Y9C/xC9JlZRodiqPQTfleQIKuWTGtH0vw9BORcEl2Lm5gYzxW9aDtqWKJWDCYrzvkG5ZpUvjVNujkI7V7x0FS4wZJJGtTBj2zW9vJP7ntXOMj8JCqCxHUPzroTiXFFM6Co92hQaOcmAJ41rYv1LeZ5pxtBG7NgT/98p/yVW95h/U9i/2qienZRxzZINskh3ik0NyQhqkSVqEk3vySF7Iq/PgPDlvzvtX6YRT9qyTH3A+PgF5saZU</latexit>


2 1
H=
1 2
• Notice that H for this f does not depend on (x,y).

• Does there exist any vector v s.t. vTHv < 0?

• Yes. For example, v = [1 2]T:

Jacob Whitehill, WPI

fi
Example
• Suppose f(x,y) = xy + x2 - y2.

Then the rst derivatives are: = y + 2x =x 2y

@x @y

• The Hessian matrix is therefore:


2 1
H=
1 2
• Notice that H for this f does not depend on (x,y).

• Does there exist any vector v s.t. vTHv < 0?

  
>
<latexit sha1_base64="/vJfKJ4gbUuLbZJenEZ//ofgML8=">AAADB3icrVJNaxsxENVu2iZVP+Kkx1IQNQ29xOw6n5dAaC85plAnAWtrtPKsLaLVLtJswCy+5ZK/kksPLaXX/oXe8m8qO3to83XqgODxZt48aTRpqZXDKLoKwoVHj58sLj2lz56/eLncWlk9ckVlJfRkoQt7kgoHWhnooUINJ6UFkacajtPTj7P88RlYpwrzGSclJLkYGZUpKdBTg5XgDdeQYZ9RnsJImVpYKybTWk5pzDhnXcrBDBuWcqtGY0y+cCxKeo/QK7tsjcWc+w5rbP3ODveJH3L1CbrHHrDdnNlt/C87ukfXu4NWO+pE82C3QdyANmnicND6zYeFrHIwKLVwrh9HJSa+NSqpwTevHJRCnooR9D00IgeX1PN/nLJ3nhmyrLD+GGRz9m9FLXLnJnnqK3OBY3czNyPvyvUrzHaTWpmyQjDy2iirNMOCzZaCDZUFiXrigZBW+bsyORZWSPSrQ/0Q4ptPvg2Oup14u7P1abO9/6EZxxJ5Td6S9yQmO2SfHJBD0iMyOA8ug2/B9/Ai/Br+CH9el4ZBo3lF/onw1x90OO9X</latexit>

1 2 1 1
• Yes. For example, v = [1 2]T: 2 1 2 2

⇥ ⇤ 1
= 4 3 = 2
2
Jacob Whitehill, WPI
fi
Example
• Graph of f(x,y) = xy + x2 - y2:

76
Convex ML models

• Prominent convex models in ML include linear regression,

logistic regression, softmax regression, and support
vector machines (SVM).

• However, models in deep learning are generally not

convex.

• Much DL research is devoted to how to optimize the

weights to deliver good generalization performance.

Jacob Whitehill, WPI

Optimization:
what can go wrong?
• In general ML and DL models, optimization is usually not
so simple, due to:
1. Presence of multiple local minima
Optimization:
what can go wrong?
• In general ML and DL models, optimization is usually not
so simple, due to:
1. Presence of multiple local minima & saddle points
Optimization:
what can go wrong?
• In general ML and DL models, optimization is usually not
so simple, due to:
1. Presence of multiple local minima & saddle points
global maximum

local maximum

saddle point

local minimum

global minimum
Optimization:
what can go wrong?
• In general ML and DL models, optimization is usually not
so simple, due to:
2. Bad initialization of the weights w.

not so good

good
local minimum

global minimum
Optimization:
what can go wrong?
• In general ML and DL models, optimization is usually not
so simple, due to:
3. Learning rate is too small.

Learning
rate too
small

global minimum
Optimization:
what can go wrong?
• In general ML and DL models, optimization is usually not
so simple, due to:
3. Learning rate is too small.

Learning
rate too
small

global minimum
Optimization:
what can go wrong?
• In general ML and DL models, optimization is usually not
so simple, due to:
3. Learning rate is too small.

Learning
rate too
small

global minimum
Optimization:
what can go wrong?
• In general ML and DL models, optimization is usually not
so simple, due to:
3. Learning rate is too small.

Learning
rate too
small

global minimum
Optimization:
what can go wrong?
• In general ML and DL models, optimization is usually not
so simple, due to:
4. Learning rate is too large.

Learning
rate too
large

global minimum
Optimization:
what can go wrong?
• In general ML and DL models, optimization is usually not
so simple, due to:
4. Learning rate is too large.

Learning
rate too
large

global minimum
Optimization:
what can go wrong?
• In general ML and DL models, optimization is usually not
so simple, due to:
4. Learning rate is too large.

Learning
rate too
large

global minimum
Optimization:
what can go wrong?
• In general ML and DL models, optimization is usually not
so simple, due to:
4. Learning rate is too large. (o the chart)

Learning
rate too
large

global minimum
ff
Optimization:
what can go wrong?
• With multidimensional weight vectors, badly chosen
learning rates can cause more subtle problems.

• Consider the cost f whose level sets are shown below:

Which direction does the

rw f (w) gradient point?

w2
Optimization:
what can go wrong?
• With multidimensional weight vectors, badly chosen
learning rates can cause more subtle problems.

• Consider the cost f whose level sets are shown below:

-rw f (w)

w2
Optimization:
what can go wrong?
• With multidimensional weight vectors, badly chosen
learning rates can cause more subtle problems.

• Gradient descent guides the search along the direction of

steepest decrease in f.
Optimization:
what can go wrong?
• With multidimensional weight vectors, badly chosen
learning rates can cause more subtle problems.

• Gradient descent guides the search along the direction of

steepest decrease in f.
Optimization:
what can go wrong?
• With multidimensional weight vectors, badly chosen
learning rates can cause more subtle problems.

• Gradient descent guides the search along the direction of

steepest decrease in f.
Optimization:
what can go wrong?
• With multidimensional weight vectors, badly chosen
learning rates can cause more subtle problems.

• But what if the level sets are ellipsoids instead of spheres?

• If we are lucky, we still converge quickly.

Optimization:
what can go wrong?
• With multidimensional weight vectors, badly chosen
learning rates can cause more subtle problems.

• But what if the level sets are ellipsoids instead of spheres?

• If we are lucky, we still converge quickly.

Optimization:
what can go wrong?
• With multidimensional weight vectors, badly chosen
learning rates can cause more subtle problems.

• But what if the level sets are ellipsoids instead of spheres?

• If we are lucky, we still converge quickly.

Optimization:
what can go wrong?
• With multidimensional weight vectors, badly chosen
learning rates can cause more subtle problems.

• But what if the level sets are ellipsoids instead of spheres?

• If we are lucky, we still converge quickly.

Optimization:
what can go wrong?
• With multidimensional weight vectors, badly chosen
learning rates can cause more subtle problems.

• But what if the level sets are ellipsoids instead of spheres?

• If we are unlucky, convergence is very slow.

Optimization:
what can go wrong?
• With multidimensional weight vectors, badly chosen
learning rates can cause more subtle problems.

• But what if the level sets are ellipsoids instead of spheres?

• If we are unlucky, convergence is very slow.

Optimization:
what can go wrong?
• With multidimensional weight vectors, badly chosen
learning rates can cause more subtle problems.

• But what if the level sets are ellipsoids instead of spheres?

• If we are unlucky, convergence is very slow.

Optimization:
what can go wrong?
• With multidimensional weight vectors, badly chosen
learning rates can cause more subtle problems.

• But what if the level sets are ellipsoids instead of spheres?

• If we are unlucky, convergence is very slow.

Optimization:
what can go wrong?
• With multidimensional weight vectors, badly chosen
learning rates can cause more subtle problems.

• But what if the level sets are ellipsoids instead of spheres?

• If we are unlucky, convergence is very slow.

Curvature
• The problem is that gradient descent only considers slope
(1st-order e ect), i.e., how f changes with w.

• The gradient does not consider how the slope itself

changes with w (2nd-order e ect).

• The higher-order derivatives, rw f (w)

including the Hessian
H, determine the
curvature of f.
ff
ff
Curvature
• The problem is that gradient descent only considers slope
(1st-order e ect), i.e., how f changes with w.

• The gradient does not consider how the slope itself

changes with w (2nd-order e ect).

• The higher-order derivatives,

including the Hessian
H, determine the
curvature of f.
ff
ff
Curvature
• The problem is that gradient descent only considers slope
(1st-order e ect), i.e., how f changes with w.

• The gradient does not consider how the slope itself

changes with w (2nd-order e ect).

• The higher-order derivatives,

including the Hessian
H, determine the
curvature of f.
ff
ff
Optimization:
what can we do?
• To accelerate optimization of the weights, we can either:

• Alter the curvature of the loss by transforming the input

data.

• Change our optimization method to account for the

curvature.

• Both of these strategies play an important role in deep

learning.

COFIMCO Installation and Operation Manual
100% (2)
COFIMCO Installation and Operation Manual
11 pages
Parasite Zapper Circuit
No ratings yet
Parasite Zapper Circuit
8 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
1.1 Introduction
No ratings yet
1.1 Introduction
73 pages
Bai 1 Eng
No ratings yet
Bai 1 Eng
10 pages
3.linear Regression
No ratings yet
3.linear Regression
18 pages
L10 Learning II Gradient Based Learning
No ratings yet
L10 Learning II Gradient Based Learning
72 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
UNIT 1 Introduction Part 1
No ratings yet
UNIT 1 Introduction Part 1
37 pages
An Introduction To Machine Learning
No ratings yet
An Introduction To Machine Learning
136 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
Lab 4 - Markdown Practical - Solution
No ratings yet
Lab 4 - Markdown Practical - Solution
5 pages
Deep Learning
No ratings yet
Deep Learning
21 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Notes5 Regression
No ratings yet
Notes5 Regression
14 pages
Machine Learning Notes Cs229 1
No ratings yet
Machine Learning Notes Cs229 1
217 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Lecture 17&18 - Introduction To Machine Learning
No ratings yet
Lecture 17&18 - Introduction To Machine Learning
51 pages
Mastering The Basics of Machine Learning
No ratings yet
Mastering The Basics of Machine Learning
65 pages
DL - M2 - Deep Feedforward NN
No ratings yet
DL - M2 - Deep Feedforward NN
97 pages
ML Labs
No ratings yet
ML Labs
46 pages
HODL Lec 2 Training NNs Intro TF
No ratings yet
HODL Lec 2 Training NNs Intro TF
83 pages
DNN - M2 - Deep Feedforward NN 23dec
No ratings yet
DNN - M2 - Deep Feedforward NN 23dec
97 pages
CS229 Lecture 2 PDF
100% (1)
CS229 Lecture 2 PDF
48 pages
Bruno Gonçalves: Deep Learning From Scratch
No ratings yet
Bruno Gonçalves: Deep Learning From Scratch
95 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
Intro To Machine Learning With PyTorch
No ratings yet
Intro To Machine Learning With PyTorch
48 pages
Brief Summary ML
No ratings yet
Brief Summary ML
25 pages
ML - Zep
No ratings yet
ML - Zep
94 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
Supervised Learning
No ratings yet
Supervised Learning
41 pages
NN Theory
No ratings yet
NN Theory
138 pages
Cours 5
No ratings yet
Cours 5
23 pages
Lec 2 Basics of Machine Learning
No ratings yet
Lec 2 Basics of Machine Learning
35 pages
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
Lecture8 DeepLearning
No ratings yet
Lecture8 DeepLearning
94 pages
ANN-Unit 3 - Regression & Multi-Layer Perceptron
No ratings yet
ANN-Unit 3 - Regression & Multi-Layer Perceptron
35 pages
Neural Network - Optimization DRAFT 3.11
No ratings yet
Neural Network - Optimization DRAFT 3.11
66 pages
Artificial Neural Networks: Multilayer Perceptrons Backpropagation
No ratings yet
Artificial Neural Networks: Multilayer Perceptrons Backpropagation
71 pages
Module3 Ch1
No ratings yet
Module3 Ch1
83 pages
Module 3
No ratings yet
Module 3
27 pages
Manual - Deep Learning Lab.
No ratings yet
Manual - Deep Learning Lab.
43 pages
Lec 03
No ratings yet
Lec 03
42 pages
Regression
No ratings yet
Regression
39 pages
Machinelearning
No ratings yet
Machinelearning
59 pages
Advanced Topics in Machine Learning: Supervised Learning, Deep Learning, and Optimization Techniques
No ratings yet
Advanced Topics in Machine Learning: Supervised Learning, Deep Learning, and Optimization Techniques
5 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
2021 10 11 - Intro ML - Inserm
No ratings yet
2021 10 11 - Intro ML - Inserm
41 pages
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
No ratings yet
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
61 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
No ratings yet
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
86 pages
ML Notes
No ratings yet
ML Notes
14 pages
Week11 - Regularization and Optimization
No ratings yet
Week11 - Regularization and Optimization
75 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
Lect 1
No ratings yet
Lect 1
24 pages
White Shade: The Real-World Primer for the Black Professional Woman
From Everand
White Shade: The Real-World Primer for the Black Professional Woman
Dr. Michelle L. Shelton
No ratings yet
I'M STILL HEALING: A poetry collection of love and trauma.
From Everand
I'M STILL HEALING: A poetry collection of love and trauma.
Koko Escoto
No ratings yet
Political Power: Ted Kennedy
From Everand
Political Power: Ted Kennedy
Brent Sprecher
No ratings yet
Violet Rose #0
From Everand
Violet Rose #0
Emma Davis
No ratings yet
Daily Accomplishment Report
No ratings yet
Daily Accomplishment Report
13 pages
United States Court of Appeals, Eleventh Circuit
No ratings yet
United States Court of Appeals, Eleventh Circuit
5 pages
CB Model Gearbox Rebuild
No ratings yet
CB Model Gearbox Rebuild
7 pages
Island Agriculture Assessment - TOR
No ratings yet
Island Agriculture Assessment - TOR
2 pages
Zenit Mataplast P.Ltd. vs. State of Maharashtra & Ors PDF
No ratings yet
Zenit Mataplast P.Ltd. vs. State of Maharashtra & Ors PDF
3 pages
India - Gratuity - Form - V2 - Signed PDF
No ratings yet
India - Gratuity - Form - V2 - Signed PDF
2 pages
Academic Full Length Practice Test
0% (1)
Academic Full Length Practice Test
25 pages
1.1 Mechanical Tender Drawing For Sanwa Project (R2)
No ratings yet
1.1 Mechanical Tender Drawing For Sanwa Project (R2)
9 pages
Cultures
No ratings yet
Cultures
3 pages
Wise Holdings Vs Garcia
100% (2)
Wise Holdings Vs Garcia
2 pages
LFAR 1 - LFAR Format
No ratings yet
LFAR 1 - LFAR Format
18 pages
WNS PTP 54600-58600
No ratings yet
WNS PTP 54600-58600
2 pages
Hotel Ibis Senen
No ratings yet
Hotel Ibis Senen
1 page
HRSG Optimization Design Article
No ratings yet
HRSG Optimization Design Article
28 pages
Transmission Line Modelling and Performance
100% (1)
Transmission Line Modelling and Performance
8 pages
3 RD Sem Results
No ratings yet
3 RD Sem Results
2 pages
IEC-IM03 Series: Key Features
No ratings yet
IEC-IM03 Series: Key Features
1 page
Mechanical Module 06
No ratings yet
Mechanical Module 06
14 pages
The 4 Unique Buying Styles
100% (1)
The 4 Unique Buying Styles
4 pages
Peta1 Q1
No ratings yet
Peta1 Q1
2 pages
Cloth Stock Management
No ratings yet
Cloth Stock Management
8 pages
SDS Underwater Cutting Rods 2018 PDF
100% (1)
SDS Underwater Cutting Rods 2018 PDF
8 pages
Hexcel HBS Analysis
50% (2)
Hexcel HBS Analysis
3 pages
Moa
No ratings yet
Moa
4 pages
SBR - Chapter 1
No ratings yet
SBR - Chapter 1
2 pages
K7D628
No ratings yet
K7D628
16 pages
Cage Trim Valves
100% (1)
Cage Trim Valves
57 pages
List of Lingerie Brands - Reader
No ratings yet
List of Lingerie Brands - Reader
2 pages