Regularization

Sgp registration

Uploaded by

kaushalkachhadiya22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

33 views7 pages

Regularization

Sgp registration

Uploaded by

kaushalkachhadiya22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 7

07 Reguianzavion frozo1s 07: Regularization Previous Next Index The problem of overfitting * So far we've seen a few algorithms - work well for many applications, but can suffer from the problem of overfitting © What is overfitting? + What is regularization and how does it help ‘Overfitting with linear regression + Using our house pricing example again © Fita linear function to the data - not a great model «= This is underfitting - also known as high bias » Bias is a historic/technical one - if we're fitting a straight line to the data we have a strong preconception that there should be a linear fit « In this case, this is not correct, but a straight line can't help being straight! © Fita quadratic function = Works well © Fit a 4th order polynomial = Now curve fit’s through all five examples = Seems to do a good job fitting the training set » But, despite fitting the data we've provided very well, this is actually not such a good model «= This is overfitting - also known as high variance © Algorithm has high variance = High variance - if fitting high order polynomial then the hypothesis can basically fit any data = Space of hypothesis is too large Price ovehit” High varionea + To recap, if we have too many features then the learned hypothesis may give a cost function of exactly zero © But this tries too hard to fit the training set © Fails to provide a general solution - unable to generalize (apply to new examples) Overfitting with logistic regression + Same thing can happen to logistic regression © Sigmoidal function is an underfit fle TE :%6C2%A0IMachine_learsing_compltel07_Regulaization him! w‘ sho(« 980 + O10; +0222) g(Op +0101 + 02x — g(Oy + Ora + O209 wir? (9 = sigmoid function) +03} + Og} FOs}sr + Ornjx3 +0512) UNDERFITTING OVERFITTING (high bias) {high variance) Addressing overfitting + Later welll look at identifying when overfitting and underfitting is occurring + Earlier we just plotted a higher order function - saw that it looks "too curvy" © Plotting hypothesis is one way to decide, but doesn't always work © Often have lots of a features - here it's not just a case of selecting a degree polynomial, but also harder to plot the data and visualize to decide what features to keep and which to drop © Ifyou have lots of features and little data - overfitting can be a problem + How do we deal with this? © 1) Reduce number of features = Manually select which features to keep = Model selection algorithms are discussed later (good for reducing number of features) = But, in reducing the number of features we lose some information = Ideally select those features which minimize data loss, but even so, some info is lost © 2) Regularization » Keep all features, but reduce magnitude of parameters 6 * Works well when we have a lot of features, each of which contributes a bit to predicting y Cost function optimization for regularization + Penalize and make some of the @ parameters really small © eg. here 0, and 04 m 2 min = (hole) — y)? & \oow eS + (ooo By * The addition in blue is a modification of our cost function to help penalize @, and 0, file INE:%6C2%AOIMachine_learring_complete/07_Regularzaton him!w2018 07 Reguianaton © So here we end up with 0, and 0, being close to zero (because the constants are massive) So we're basically left with a quadratic function Price Size of house 49 + 81 + Ona? +I — ? , a” * In this example, we penalized two of the parameter values © More generally, regularization is as follows + Regularization © Small values for parameters corresponds to a simpler hypothesis (you effectively get rid of some of the terms) © Asimpler hypothesis is less prone to overfitting © Another example © Have 100 features X;, X25 -» X100 © Unlike the polynomial example, we don't know what are the high order terms = How do we pick the ones to pick to shrink? © With regularization, take cost function and modify it to shrink all the parameters = Adda term at the end = This regularization term shrinks every parameter = By convention you don't penalize 0 - minimization is from @, onwards 10) = ah | Sala) — 9)? + 0 Z % ©, 61,8, Ore In practice, if you include 0g has little impact, dis the regularization parameter © Controls a trade off between our two goals = 1) Want to fit the training set well = 2) Want to keep parameters small With our example, using the regularized objective (i.e. the cost function with the regularization term) you get a much smoother curve which fits the data and gives a much better hypothesis © If Ais very large we end up penalizing ALL the parameters (0, 8, etc.) so all the parameters end up being close to zero = If this happens, it's like we got rid of all the terms in the hypothesis = This results here is then underfitting = So this hypothesis is too biased because of the absence of any parameters (effectively) So, Ashould be chosen carefully - not too big. © We look at some automatic ways to select A later in the course ‘le INE: I%6C2%A0/Machine_Jearning_complete/07_Reguiarization.himi an4 Regularized linea: re: n. * Previously, we looked at two algorithms for linear regression © Gradient descent © Normal equation * Our linear regression with regularization is shown below J) = 3, [Ewe — oP 8 1 j=l min J(0) @ * Previously, gradient descent would repeatedly update the parameters 0;, where j = 01,2. simultaneously © Shown below Repeat { m . / % -at Di lho(2!) — yO) 2 rm ea 8a De (hala) — yl)a{ = + We've got the 09 update here shown explicitly © This is because for regularization we don't penalize 89 so treat it slightly differently + How do we regularize these two rules? © Take the term and add \/m * 0; = Sum for every 0 (i.e. j = 0 ton) © This gives regularization for gradient descent ‘* We can show using calculus that the equation given below is the partial derivative of the regularized 4) 13 h(a ()g@ X a —O Jag Ue lhole) — y)25 + SB, y G =% 1 n) 3 Ue) cagulectad, * The update for 0; © 6; gets updated to = 0;-a* [a big term which also depends on 6) * So ifyou group the 6; terms together ‘le /NE:6C2%ADMachine_learning_complotel07_Regularizaion him07 Regutanvanon or) ~ y)x\" * Theterm (1-03) ™ © Is going to be a number less than 1 usually © Usually learning rate is small and m is large ® So this typically evaluates to (1 - a small number) ® So the term is often around 0.99 to 0.95 + This in effect means 0j gets multiplied by 0.99 © Means the squared norm of Oj little smaller «The second term is exactly the same as the original gradient descent Regularization with the normal equation ‘© Normal equation is the other linear regression model © Minimize the J(8) using the normal equation © To.use regularization we add a term (+ A {n-+1 x n+1]) to the equation = [n+1x n+] is the n+1 identity matrix + We saw earlier that logistic regression can be prone to overfitting with lots of features + Logistic regression cost function is as follows; 10) =~ [2 $y ogho(a + (1 y)1og (0 ~ hala) + To modify it we have to add an extra term \ n 4#2¢ 8; Qe jer « This has the effect of penalizing the parameters 0,, 02 up to On © Means, like with linear regression, we can get what appears to be a better fitting lower order hypothesis * How do we implement this? file WE6C2%AON 1rning_complete/07_Regularization htm sn© Original logistic regression with gradient descent function was as follows BS r 4 i (i) m (hola) = y )a; g (j =0,1,2,3,....) * Azain, to modify the algorithm we simply need to modify the update rule for 9,, onwards ° ae cosmetically the same as linear regression, except obviously the hypothesis is very lifferent 9) = 8)(1 — a) — ad F((al9) — y)00” 1 dvanced optimization of re; larize: ear regression * As before, define a costFunction which takes a @ parameter and gives jVal and gradient back function [jval, gradient] = costFunction (theta) Val = [code to compute J(9)] ; gradient(1) = [code to compute F() 1; | gradient (2) = [codetocompute siJ(0) 1; gradient (3) = [codetocompute 3:J(0) 1; gradient (n+1) = [code tocompute g3-J(0) }; use fminune © Pass it an @cost function argument © Minimizes in an optimized manner using the cost function * jval © Need code to compute J(0) » Need to include regularization term + Gradient © Needs to be the partial derivative of J(8) with respect to 0; © Adding the appropriate term here is also necessary fie:IMEI%C2%ADIMachine_learning_compltel07_Reguarzation himl orro ee ovo 7 Rexnerraon function [jVal, gradient] = costFunction (theta) jval = [code to compute /(9) ; (0) - [- Evrmecaieny vy Mog 1 hee] + 98s 8 gradient (1) = [code to compute ja, /() 15 ‘ EE (hola) — y 26? gradient (2) = [code to compute IO) 1: a E thot (rl) = y)rf? + AO | gradient (3) = ecu Ag Fail 2 Ehoto) = yay + & J(O) 1; gradient (n+1) = [code to compute oe + Ensure summation doesn't extend to to the lambda term! © It doesn't, but, you know, don't be daft! fie 1E.!%C2%AOMachine_learning_completel07_Regularization html amok § a"

4th Unit DL Final Class Notes
No ratings yet
4th Unit DL Final Class Notes
68 pages
Unit4 DL Final
No ratings yet
Unit4 DL Final
30 pages
Linear Regression Python Programming
No ratings yet
Linear Regression Python Programming
25 pages
Regularization: The Problem of Overfitting
No ratings yet
Regularization: The Problem of Overfitting
24 pages
12-Regularization For Deep Learning-17!08!2024
No ratings yet
12-Regularization For Deep Learning-17!08!2024
51 pages
NNDL Notes
No ratings yet
NNDL Notes
73 pages
Skript Opt Mach
No ratings yet
Skript Opt Mach
49 pages
Regularization
No ratings yet
Regularization
46 pages
07 Regularization
No ratings yet
07 Regularization
51 pages
Data Science L20 - Regularization
No ratings yet
Data Science L20 - Regularization
41 pages
Lec8 Regularization
No ratings yet
Lec8 Regularization
41 pages
Regularization: Updates To Assignment
No ratings yet
Regularization: Updates To Assignment
21 pages
Unit 2.3
No ratings yet
Unit 2.3
43 pages
L09 - Regularisation
No ratings yet
L09 - Regularisation
79 pages
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
No ratings yet
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
100 pages
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
No ratings yet
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
43 pages
Lec 05 Regularization
No ratings yet
Lec 05 Regularization
77 pages
Lecture 7
No ratings yet
Lecture 7
19 pages
Unit 4
No ratings yet
Unit 4
62 pages
Group 30
No ratings yet
Group 30
33 pages
CM20315 09 Regularization
No ratings yet
CM20315 09 Regularization
44 pages
Lecture 4.2. Generalization and Regularization
No ratings yet
Lecture 4.2. Generalization and Regularization
23 pages
A Layman's Guide To The Project
No ratings yet
A Layman's Guide To The Project
34 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Regularization (Mathematics) - Wikipedia
No ratings yet
Regularization (Mathematics) - Wikipedia
13 pages
NN&DL Unit-IV Regularization For Deep Learning
No ratings yet
NN&DL Unit-IV Regularization For Deep Learning
16 pages
Multiclass Classification Regularization
No ratings yet
Multiclass Classification Regularization
31 pages
Lecture W4a
No ratings yet
Lecture W4a
17 pages
Lecture15 Regularization
No ratings yet
Lecture15 Regularization
47 pages
Regularization (Mathematics)
No ratings yet
Regularization (Mathematics)
11 pages
Group30 Linear Regression
No ratings yet
Group30 Linear Regression
20 pages
Regularization: Swetha V, Research Scholar
No ratings yet
Regularization: Swetha V, Research Scholar
32 pages
Regularization 1704650055
No ratings yet
Regularization 1704650055
32 pages
Regularization - AndrewNg
No ratings yet
Regularization - AndrewNg
24 pages
ML UNIT - 2 Part 2
No ratings yet
ML UNIT - 2 Part 2
20 pages
L11+ Regularization
No ratings yet
L11+ Regularization
24 pages
Deep Learning Basics Lecture 3 Regularization I
No ratings yet
Deep Learning Basics Lecture 3 Regularization I
32 pages
Introduction To Machine Learning Lecture 2: Linear Regression
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
Curs5site PDF
No ratings yet
Curs5site PDF
47 pages
Module - 2 Ver 1.4
No ratings yet
Module - 2 Ver 1.4
35 pages
L11+ Regularization
No ratings yet
L11+ Regularization
25 pages
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
No ratings yet
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
20 pages
The Problem of Overfitting: Overfitting With Linear Regression
No ratings yet
The Problem of Overfitting: Overfitting With Linear Regression
32 pages
Regularization PDF
No ratings yet
Regularization PDF
32 pages
Regularization: The Problem of Overfitting
No ratings yet
Regularization: The Problem of Overfitting
23 pages
Achine Learning Egularization: Ntroduction
No ratings yet
Achine Learning Egularization: Ntroduction
10 pages
Convex Problems
No ratings yet
Convex Problems
48 pages
Lecture 7
No ratings yet
Lecture 7
19 pages
07 Regularization
No ratings yet
07 Regularization
7 pages
Regularization: The Problem of Overfitting
No ratings yet
Regularization: The Problem of Overfitting
23 pages
Introduction To Machine Learning: The Problem of Overfitting
No ratings yet
Introduction To Machine Learning: The Problem of Overfitting
8 pages
02 - Linear Models - C - Regularization - Logistic - Regression
No ratings yet
02 - Linear Models - C - Regularization - Logistic - Regression
16 pages
Question 4
No ratings yet
Question 4
2 pages
5-Introduction To regularization-03-Aug-2020Material - I - 03-Aug-2020 - Module3 - Regularization
No ratings yet
5-Introduction To regularization-03-Aug-2020Material - I - 03-Aug-2020 - Module3 - Regularization
10 pages
Gradient Descent
No ratings yet
Gradient Descent
3 pages
Regularization
No ratings yet
Regularization
14 pages
21 CF With Regularization (Guide)
No ratings yet
21 CF With Regularization (Guide)
2 pages
07: Regularization: The Problem of Overfitting
No ratings yet
07: Regularization: The Problem of Overfitting
5 pages

Regularization

Uploaded by

Regularization

Uploaded by

You might also like