ML algorithms

Uploaded by

Varsha Kumaraguru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

15 views10 pages

ML algorithms

Uploaded by

Varsha Kumaraguru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 10

92. MACHINE LEARNING rf jatil from Equation (4.2), as ; differentiating E be obtained by gradient can is, oeek bin yi aaray uy, in De 1 Leu —o4)? W deD so DE (een an) = 42-0052 ia a a = u- 04) 55, ta — wx) to OE = Sta — 04)(—xia) (46) ui De where xq denotes the single input component x; for training example d. We now have an equation that gives 2 in terms of the linear unit inputs id, Outputs f4, and target values tz associated with the training examples. Substituting Equation (4.6) into Equation (4.5) yields the weight update rule for gradient descent Aw: =n a = 04) xia 2) Fors jova: Summarize, the gradient descent algorithm for training linear units is a follows: Pick an initial random weight vector, Apply the linear unit to all training cxamples, then compute Aw, for each weight according to Equation (4.7). Update tach weight w, by adding Aw,, then repeat this process, ‘This algorithm is given in Table 4.1. Because the error surface contains only 2 single global minimum, this reason, one common modification to the algorithm is to Sradually reduce the value of n as the number of gradient descent steps grows. 4433 STOCHASTIC APPROXIMATION TO GRADIENT DESCENT Gradient descent is an important general paradigm for learning. It is a Strategy for searching through a large or infinite hypothesis space that can be applied whenever (1) the hypothesis space Contains continuously parameterized hypotheses (e.g., the Weights in a linear unit), and (2) the enor can be differentiated with respect to these hypothesis Parameters. The key practical difficulties in applying gradient descent are (1) Converging to a local minimum can sometimes be quite slow (ice. it can require many thousands of gradient descent steps), and (2) if there are multiple local minima in the error Sumtacom beaitiiexe is no) guarantee! thar th Procedure will find the global minimum, e98 MACHINE LBARNING BAcKPROPAGATION( raining examples, Ny Min Mout, Mhidden) Each training example is a pair of the form (i,7 ), where x is the vector of network input values, and f is the vector of target network output values. nis the learning rate (¢,8.,.05). nin is the number of network inputs, nnidden the number of units in the hidden layer, and nous the number of output units. The input from unit into unit j is denoted xji, and the weight from unit ito unit j is denoted wy «Create a feed-forward network with nin inputs, nyidden hidden units, and nou, Output units. + Initialize all network weights to small random numbers (e.g., between —.05 and .05). ‘* Until the termination condition is met, Do © For each (z,7) in training examples, Do Propagate the input forward through the network: 1, Input the instance % to the network and compute the output 0, of every unit u in the network. Propagate the errors backward through the network: 2. For each network output unit k, calculate its error term 5, 4 © o4(1 = o4)(t4 ~ 0%) 3. For each hidden unit A, calculate its error term bn 81 on 04) D> wand keouiputs 4, Update each network weight wji wy) — wy + Aw, where uy = 15) TABLE 4,2 The stochastic gradient descent version of the B, |ACKPROPAGATION al ‘containing two layers of sigmoid units, Boni for feedforward networkssaree i instance x be descri ig precisely: yet an arbitrary 1! nbd yg sidean distance. MOTE feature Vector oan) tribute of instance x. Then the ke cr tes he vals the Biel to be d(i3)), where “Ny ces x and x) (ay (x). 020 * vere a, (x) del Baten tro instan' Yee =a, (x))? d(x. %)) = i be either discret . jag the target function may va In nearest eer Jeaning diserete-valued tare functions of cor real-valued. Lets frst finite set (bis «-¥e}- TE KN OREST Net form f :#" > V, whe! iscrete-valued target function is given in Table algorithm for approx! this algorithm as its estimate of f,, ; As shown there, ye ais value of f among the k training, examples neares eos ia 1, then the 1-NEAREST NeIGHBOR algorithm assigns to fu) the ae See ‘ys is the training instance nearest to xq. For larger ya}, iw of fits, algorithm assigns the most common value among the k nearest Uaiing ~ewrears! alue f (xq) returned by peration of the k-NEAREST NEIGHBOR algorithm fo, the case where the instances are points in a two~ mensional space and where the target function is boolean valued. The positive and negative training examples are Shown by “+ and “—" respectively. A query point xq is shown as well. Note te L-Nearest NEIGHBOR algorithm classifies xg as a positive example in this figure, whereas the 5-NearEst NEIGHBOR algorithm classifies it as a negative example. What is the nature of the hypothesis space H implicitly considered by the k-Nearest NEIGHBOR algorithm? Note the k-NEAREST NEIGHBOR algorithm never forms an explicit general hypothesis f regarding the target function f. It simply computes the classification of each new query instance as needed. Nevertheless, les. Figure 8.1 illustrates the o Training algorithm: + For each training example (x, f(x)), add the example to the list training examples ‘Classification algorithm: * Given a query instance xq to be classified, + Let x... : 4b enay 77 Genet the instances from training.examples that are nears 1% : k xq) < argmax 5 gm Ds , £)) where (a, 6) =1 if a= and where 3(a, b) = 0 otherwise. TABLE 8.1 The k-NeaKest Nex CARDS alpen : Bonithm for approximating a discrete-valued function f 2X" 7"‘ » 94 MACHINE LEARNING stic gradient descent can be mag, ], stochastic g) diffe ize) sufficiently small, stoct Cerne Vey tite ec A ee teal descent arbitrarily au Ki gba ty ie griient descent and stochastic gradie! standard gra s i r all examples gradient descent, the error 18 summed oye big ES eg : iene Teh whereas in stochastic gradient descei 2 ee updating weights, Merc ne : ; pon examining each training exampre. Summing over multiple examples in standard gra ient descent requires m, mil dient de y it uses thi i ate step. On the other hand, because i a tion per weight update step. On , rie ration tidied gradient descent is often used with a larger step size i it. i hastic gradient descent er weight update than stoc! ent desce . . ii cases where there are multiple local aa ae ae ic. gradi avoid falling int tic gradient descent can sometimes avoi t _loce Seats it uses the various VE,(i) rather than VE(w) to guide its search, Both stochastic and standard gradient descent methods are commonly used in ractice, F ‘The training rule in Equation (4.10) is known as the delta rule, or sometimes the LMS (least-mean-square) rule, Adaline rule, or Widrow-Hoff rule (after its inventors). In Chapter 1 we referred to it as the LMS weight-update rule when describing its use for learning an evaluation function for game playing. Notice the delta rule in Equation (4.10) is similar to the perceptron training rule in Equation (4.4.2). In fact, the two expressions appear to be identical. However, the rules are different because in the delta rule o refers to the linear unit output (4) = i. %, whereas for the Perceptron rule refers to the thresholded output o(8) = sgn(ib -z), Although we have presented the delta rule as a method for learning weights for unthresholded linear units, it can easily be used 0 train thresholded perceptron units, as well. Suppose that 0 = ij. z is the unthresholded linear unit output as above, and o/ = sgn(ib-z) is the result of thresholding o as in the perce; coReNGE if we wish to train a perceptron to fit training examples with target valuec fl fe o’, we can use these same target values and examples to train 9 instead, i if delta ule, Clearly i the unthresholded output oes eves eee Perfectly, then the threshold output o/ will fit them as and sgn(—1) = —1). Even when the target thresholded 0’ value will correctly fit 8S eee eee eseteetly i unit output o has ¢! fe Ver the linear pt las the correct sign, Notice, however, that while this Procedure wil] pind P a i 3 ights, Th ffe x Ng Perceptr We Hs cons; gered two. ‘ar algorithms for iterative; learning p, eptron etween these algorithms is that the perceptron trai in-Bae aren 7” ining example is @ pair ofthe form ens vl en ee + fch 10 S01 seal Aor vale i nto condition mt, Do kaze each AW, (0 7er0, $ preach i!) traningexamples, Do input the instance # to the unit and c {Foc each linear unit weight w,, Do = MH ouput 0 va AU) © Bw + n(t — os; {reach linear unit weight w,, Do Hw + Aw, eee eral ' DesceNT algorithm for traning a near unit. To pa 4 To implement the stochastic approximation went descent, Equation (4.2) is deleted, and Equation (T4.1) replaced by wy oe soothe One common variation on gradient descent intended to alleviate these diffi- cae is called incremental gradient descent, or alternatively stochastic gradient ocent, Whereas the gradient descent training rule presented in Equation (4.7) computes weight updates after summing over all the training examples in D, the ida behind stochastic gradient descent is to approximate this gradient descent seach by updating weights incrementally, following the calculation of the error ‘nreach individual example. The modified training rule is like the training rule sien by Equation (4.7) except that as we iterate through each training example veupdate the weight according to Aw; =n(¢-0) x; (4.10) vieref,0, and x; are the target value, unit output, and ith input for the training ‘tanple in question, To modify the gradient descent algorithm of Table 4.1 to ‘aplement this stochastic approximation, Equation (T4.2) is simply deleted and Equation (4.1) replaced by w; — w;+n(t—o) x;. One way to view this stochastic ‘aicnt descent is to consider a distinct error function E4(w) defined for each “Ghidual training example d as follows Ey (i) = du = 04)? 4.1) Nee and og are the target value and the unit output value for training ex- 214: Stochastic gradient descent iterates over the training examples d in D, fc ulertion altering the weights according to the gradient with respect to stags THE Sequence of these weight updates, when iterated over all training Pi Provides a reasonable approximation to descending the gradient with "© our original error function (i). By making the value of » (the gradientUnsupervised Learning @ 283 sapoint and all of the cluster centres, gat nat we can reduce the computati hm that was described in Section algo we then compute the mean of fie iterate te algorithm until the clus gescription: Fhe Means Algorithm , Initialisation — choose a value for k — choose k random positions in the input space — assign the cluster centres 41; to those positions + Learning = repeat * for each datapoint x;: __—_*_- compute the distance to each cluster centre ~ assign the datapoint to the nearest cluster centre with distance i 4; = min dle, Hy) (14.1) * for each cluster centre: + move the position of the centre to the mean of the points in that cluster (N; is the number of points in cluster j): Ns Hy = DL (14.2) ‘test point: the distance to each cluster centre “datapoint to the nearest cluster centre with distance Bis iepoh iv 2: d, = min d(ei, 45): (14.3) ntation follows these steps almost exactly, and we can take advan- ) function, which returns the index of the minimum value, to find ister, The code that computes the distances, finds the nearest cluster centre, them eae as: “closest clrerspective 78 m Machine Learning: An Algorithmic Pt axle ——— ron Algorithm + Initialisation siti dom values = initialise all weights to small (positive ran and negative) + Training = repeat: + for each input vector: Forwards phase: ~ compute the activation of each neuron j in the hidden layer(s) using L he = Doane (44) 120 ag=g(hc) = — (45) T+ exp(—Bhe) + work throngh the network until you get to the output layer neurons, which have activations (although see also Section 4.2.3): Pe 9) G7 tyre (4.6) 7 1 Yn =9G(he) = Trench) (4.7) Backwards phase: * compute the error at the output using: Gol) = (Ye — te) Ye(1 — Ye) ~ compute the error in the hidden layer(s) using: ee i 9n(6) = ae(1 —a¢) > wed (ie) & (4.9) + update the output layer weights using: Won & Wen ~ n8o(n)abidden * update the hidden layer weights using: (4.10) * (if using sequential upd ~ arenes (4) iential updating) randomise the order on i aeetin f the i that you don't trainin exactly the same order each terre PMt Vectors 30 mn. ~ until learning stops (see Section 4.3.3) + Recall 7 use the Forwards phase in the training section above

DL
No ratings yet
DL
73 pages
Multilayer Networks and The Backpropagation Algorithm
No ratings yet
Multilayer Networks and The Backpropagation Algorithm
4 pages
Minsky y Papert
No ratings yet
Minsky y Papert
77 pages
ML807_Distributed_and_Federated_Learning_Slides_2
No ratings yet
ML807_Distributed_and_Federated_Learning_Slides_2
211 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
7 Optimization2 Stochastic Gradient
No ratings yet
7 Optimization2 Stochastic Gradient
114 pages
Lec 7 Optimization Part 2
No ratings yet
Lec 7 Optimization Part 2
139 pages
04 Optimization
No ratings yet
04 Optimization
62 pages
ANN - Part II
No ratings yet
ANN - Part II
71 pages
Lecture 10
No ratings yet
Lecture 10
155 pages
Lecture8 DeepLearning
No ratings yet
Lecture8 DeepLearning
94 pages
DSCTP 2022 1 ML Slides
No ratings yet
DSCTP 2022 1 ML Slides
110 pages
Module 3_Modified
No ratings yet
Module 3_Modified
106 pages
BackProp in Recurrent NNs
100% (1)
BackProp in Recurrent NNs
10 pages
Machine Learning Notes Cs229 1
No ratings yet
Machine Learning Notes Cs229 1
217 pages
Neural Network BSC
No ratings yet
Neural Network BSC
32 pages
ML Unit - 2
No ratings yet
ML Unit - 2
70 pages
Backpropagation Learning: 15-486/782: Artificial Neural Networks David S. Touretzky Fall 2006
No ratings yet
Backpropagation Learning: 15-486/782: Artificial Neural Networks David S. Touretzky Fall 2006
37 pages
Lec03 NeuralNetwork
No ratings yet
Lec03 NeuralNetwork
39 pages
Chapter_7
No ratings yet
Chapter_7
68 pages
Derivation of The Gradient Descent Rule
No ratings yet
Derivation of The Gradient Descent Rule
38 pages
Neural Networks - Slides - CMU - Aarti Singh & Barnabas Poczos
No ratings yet
Neural Networks - Slides - CMU - Aarti Singh & Barnabas Poczos
36 pages
1.1 Introduction
No ratings yet
1.1 Introduction
73 pages
Applying statistical learning theory to deep learning
No ratings yet
Applying statistical learning theory to deep learning
51 pages
Back-Propagation Algorithm
No ratings yet
Back-Propagation Algorithm
26 pages
GD-Example 7
No ratings yet
GD-Example 7
15 pages
Learning Representations by Backpropagating Errors PDF
No ratings yet
Learning Representations by Backpropagating Errors PDF
4 pages
855597620
No ratings yet
855597620
44 pages
deep neura network lab
No ratings yet
deep neura network lab
11 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
ML2
No ratings yet
ML2
22 pages
Kevin Swingler - Lecture 4: Multi-Layer Perceptrons
No ratings yet
Kevin Swingler - Lecture 4: Multi-Layer Perceptrons
20 pages
Lecture 4: Perceptrons and Multilayer Perceptrons: Cognitive Systems II - Machine Learning SS 2005
No ratings yet
Lecture 4: Perceptrons and Multilayer Perceptrons: Cognitive Systems II - Machine Learning SS 2005
25 pages
Neural Networks and Principal Component Analysis: Learning From Examples Without Local Minima
No ratings yet
Neural Networks and Principal Component Analysis: Learning From Examples Without Local Minima
6 pages
Back Propagation
No ratings yet
Back Propagation
37 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
71 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
Regression
No ratings yet
Regression
30 pages
Module 3.Docxaiml
No ratings yet
Module 3.Docxaiml
20 pages
Gradient Descent Algorithm and Back-Propagation Derivation
No ratings yet
Gradient Descent Algorithm and Back-Propagation Derivation
4 pages
FALLSEM2024-25 BCSE401L TH VL2024250102084 2024-09-03 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE401L TH VL2024250102084 2024-09-03 Reference-Material-I
16 pages
A Gentle Introduction To Backpropagation
100% (1)
A Gentle Introduction To Backpropagation
15 pages
L04 Slides.mlp1
No ratings yet
L04 Slides.mlp1
22 pages
Machine Learning: Algorithms and Applications: (Continued)
No ratings yet
Machine Learning: Algorithms and Applications: (Continued)
17 pages
Eio Supplementary
No ratings yet
Eio Supplementary
6 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
Chap 2 Training Feed Forward Neural Networks
No ratings yet
Chap 2 Training Feed Forward Neural Networks
22 pages
cs229 Notes1 PDF
No ratings yet
cs229 Notes1 PDF
28 pages
Mod 2.4,2.5,2.6 Architecture Design
No ratings yet
Mod 2.4,2.5,2.6 Architecture Design
20 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
Learning Rules For Multilayer Feedforward Neural Networks
No ratings yet
Learning Rules For Multilayer Feedforward Neural Networks
19 pages
Chapter 6 - Feedforward Deep Networks
No ratings yet
Chapter 6 - Feedforward Deep Networks
27 pages
Neural Network Presentation
No ratings yet
Neural Network Presentation
33 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
Lab 5: 16 April 2012 Exercises On Neural Networks
No ratings yet
Lab 5: 16 April 2012 Exercises On Neural Networks
6 pages

ML algorithms

Uploaded by

ML algorithms

Uploaded by

You might also like