ML EasySol

Uploaded by

hruchitamorey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

108 views62 pages

ML EasySol

Uploaded by

hruchitamorey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 62

Savitribai py Semester VII ( Machine Learnj hule Pune University (Computer Engineering) ng (Code : 410242) res ——— G1 Define regression analysis, wie on Detine dependent and independent variabios with a = suitable example, diac ws: + Definition : A functional relationship between two or more correlated variables that is often empirically determined from data and is used specially to predict values of one variable when given values of the others. + Regression analysis aims to establish a relationship (or influence) of a set of variables on another variable (outcome). + Definition : The variables that have influence on the outcome are called input, independent, explanatory or predictor variable. + Definition : The variable whose outcome (or value) depends on other variables is called response, outcome, or dependent variable. Q2 Whats linear regression? (4 Marks) ns, * Definition : The process of finding a straight line that best approximates a set of points on a graph is called linear regression. There are two types of linear regression. 1. Simple Linear Regression (SLR) This has only one independent (or input) variable. 2. Multiple Linear Regression (MLR) This has more than one independent (or input) Variables, The general formula (or model) for linear "eression analysis is Y= By + BiXs + BoXa + BaXs *--BaXn * € Where, * is the outcome (dependent) variable © X,are the values of independent variables © Bo is the value of ¥ when each X; is equal to 0. It is also called as y-intercept ‘© Blisthe change in Y based on the unit change in X; It is also called as regression coefficient or slope of the regression line. ‘© es the random error or noise that represents the difference between the predicted value (based on the regression model) and actual value. ‘The goal is to find the regression line that best approximates (or fits) the relation between the input variable and output variable. The objective is to find a linear model where the sum of squares of the distances is minimal. ©.3 For'the following data set, find the linear regression line, Predict the value of ¥ ifX = 10. (6 Marks) xy o}4 ifs 2|2 3/5 ’ 5|8 e|3 7/9 8|10 9} 12 iWF Machine Learning {SPPU) To predict the value oF when X= 10, put the yt ‘o predic ick Re | 2 regression line. You get Fig. 3.1. of Xin th = 12.932 or ees ¥ = 1.2368 + 1.1696 x 10 = 1) OF 13 whe, Y ek 1 fesftacbee tt a te sate 7 — following for B94 us} eas |e "iret Wat gave te Flowing data fr Big > soe a Gl veel tavel fr 10 paont®. Prodct hq tM " fot a os | 28 ra of eine was in| of 27 Marty 0 | oe ‘amt | Chotesterot ors | 025 ae 17 |__140 7s 625 2t 189 1225, 12.25, 24 210 2475 25 28 240 Torl=085 [Tom =028 4 [190 16 100 Ea-9H-9) 19 | __ 195 Hence. A= ee ea ~ 130 By = $83-1.1696 iS 7 18 170 Bix Bo = 65 ~ 1.1696 x 4.5 = 1.2368 3 > Hence, the regression line is Ps Een eek f Petre Oi? Y = 1.2368 + 1.1696X 7 140, ~24 |-2t 504 578 A sample plot of the regression line is as shown in ie 07 [| oo pa is Regression Line Plot zo | 48 [ao] asa | ane . 240 86 | 79 679.4 7396 “ 10 | -54 [si] tera | te ‘eareson Line 6 to | -34 [ot] aoa | tts a0 19 135, ~04 |~26] 104 0.16 2 166 2s | 5 3 676 su 190 1964 1996 e 10 -26 | 1% Kean ot)! Foden ot Total = 1822 [Total = 1724 ste4 | cet Hence, g, = 204-3) (-7) L(y 32 1522 Bi = a75¢= 8.8283 ————!Machine Learning (SPPU Bo fi Po = 161- 8.8283 » 19.4 =~ 10.2690 Hence, the regression line is Y = 10.2690 + agzaax Hence, someone having a BMI of 27 wout pave cholesterol level as He Ney cholesterol ~ 10.2690 + 8.8283 « 27 Cholesterol = 228 a @S Describe some of the use cases for Linear — (4 Marks) ans. Some of the common use cases (or applications) of iinear regression are as following. 1. Healthcare Healthcare industry is evolving and there are several researches that are going on. Linear regression could be used to establish the relationship between treatment and its effects or to understand complex operations of the human body to derive certain relationships. 2. Demand forecasting Businesses are always looking to maximise sales and reduce inventory. Sales might depend on several factors and it could be really helpful to determine the relationship of sales with those factors. Businesses can then try to modify those factors and appropriately forecast sales. 3. Other predictions ze : ‘There are several other areas where predictions can be made using the established linear relationship between the variables. It could be sports outcomes, crop output, machinery performance, fitness, and other similar areas. larks) 6 Write a short note on Ridge regression. Ans. : + Multicollinearity is a statistical concept where several independent variables in a model are correlated. Two variables are considered to be perfectly collinear if their correlation coefficient is +/- 1.0. Multicollinearity among independent variables results in less reliable statistical inferences, yuick Read. Tis better to use independent variables that are not correlated of repetitive when building multiple regression models that use two or more variables. ‘The existence of multicollinearity in a data set can ead to less reliable results due to larger standard ‘The linear regression fits a linear model with coefficients w = (wi, #2, «. Wp) to minimise the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation, Mathematically it solves @ problem of the following form. min px - yl, ‘The coefficient estimates for Ordinary Least Squares rely on the independence of the features. ‘When features are correlated and the columns of the design matrix have an approximately linear dependence, the design matrix becomes close to singular and as a result, the least-squares estimate becomes highly sensitive to random errors in the observed target, producing a large variance. This situation of multicollinearity can arise, for example, ‘when data are collected without an experimental design. Ridge regression is used in the case of multicollinearity. It places restrictions on the magnitudes of the estimated coefficients to avoid distortion. A penalty term proportional to the sum of the squares of the coefficients is added to reduce the standard errors, It is also called as L2 regularisation. ‘The cost function of Ridge regression is given as following. min ([Xw- yl + .[hwil, w The complexity parameter a > 0 controls the amount of shrinkage: the larger the value of g, the greater the amount of shrinkage and thus the coefficients become more robust to collinearity. The Fig. 32 illustrates Ridge coefficients as a function of the regularisation. 12 regularisation results in smaller overall weight values and stabilises the weights when there is high correlation between the input features. L2 regularisation tries to estimate the mean of the data Ar EnImm TTtinge 12ers regularisation, son ‘avoid aver only used than Fudge coot 0? 10 10" apna 1 Fig. 32 @.7 wie e shor note on Lt regularisation. (4 Marks) ‘Ans. * Lasso regression adds a penalty term proportional ‘to the sum of the absolute values of the coefficients. Itis also called as L1 regularisation. ‘+ Li regularisation has the effect of reducing the ‘number of features used in the model by pushing to zero the weights of features that would otherwise have small weights. As a result, Li regularisation results in sparse models and reduces the amount of noise in the model. L1 regularisation tries to estimate the median of the data to avoid overfitting, ‘+ Mathematically, it consists of a linear model with an added regularisation term. The objective function to ‘minimise is as following. i ein Samp) IDs lly * The lasso estimate thus solves the minimisation of the least-squares penalty with a j[wily added, where a Is a constant and ||wl|jis the ly - norm of the coefficient vector. the actual value and the Predicted values, ee ay | werresiced) | = Total Mean Error ‘The errors can be both negative or positive ayy during summation, they tend to cancel each cut. In fact, itis theoretically possible that the errgy, are such that positive and negatives cancel egy other to give zero error which could incorr, mean a perfect model. So, Mean Error is not ; recommended cost function, 9° With an example, lust (MSE) cost function, Ans. : Mean Squared Error (MSE), an improvement upon Mean Error (ME) cost function, squares the difference between the actual and predicted value to avoid negative values cancelling out positive values in error calculation, Mean Squared En (4 Maria) Example ¥ ¥ Error=¥-Y" | Error (Actual) | (Predicted) Squared 10 92 08 064 8 83 =03 0.09 5 47 03 0.09 Zi 79 0.9 O81 4 34 09 0.81 Total 244 Mean Tared | 2:4 «9 4a9 2-10 With an example, .ilustrate Moan Absolute Era (MAE) cost function, (4 Marks) Ans. : Mean Absolute Error (MAE) also attempts to solv cancelling out problem with Mean Error (ME), MAE takes the absolute value (without sign) of the difference between the actual Value and the predicted value and the then calculates the mean of the sum, Grn— Example (acral) _ ¥ (Predicted peor ve ) | Absolute Error = jv ¥1 Y ¥ Error = Y-¥" Error Squared ad 22 os | | {tere sPretetecy | __ 8 83 o2 oe oe =03 0.08 5 “7 _1_| 13] 5 ar ea 2.09 Z a | 09 z 79 -09 oat 4 an a 4 a1 09 ost Total Tota 2a 32 Ft ean sweres |. fe Mean Absolute Error Bogs ‘Ere 5 reat a2 @.12 Wih en example, explain F-Squared. (6 Marka) «ifyou compare the MAE value (0.64) wah ewe value (0.16) that you found for the same data, you find that MAE is a better way to cain et escapee ba — 2.11 With an example, ilustrate Root Mean Squared Eror (PMSE) cost function. Ans. = ‘Root Mean Squared Error (RMSE) also attempts to solve the cancelling out problem with Mean Error (ME). RMSE takes the root of Mean Squared Error (MSE). (4 Marks) Great Filed values from model ‘Observed values Fited = observed: Model explains all variance Q Fitted values from mode! Ans. R-Squared (R? or the coefficient of determination) is ‘a statistical measure in a regression model that determines the proportion of variance in the dependent variable that can be explained by the independent variable. In other words, R? shows how ‘well the data fit the regression model. For example, R? of 60% reveals that 60% of the data fit the regression model. Generally, a higher R indicates a better fit for the model. The Fig. 3.3 illustrates the model's performance for various values of R2. ‘Observed values ‘Model explains bulk of variance Inconsistent @ ‘Observed values Mode! explains 40% of variance, 90 is reasonabe. Fig. 3.3 ‘Observed values Model fits to explains any variance GUEPU © The quality of the statistical m many factors, such as the employed in the model, the units of variables, and the applied data transform ‘Thus, sometimes, a high R? can indicate the problems with the regression model © Alow R? figure is generally a bad sign for predictive models, However, in some eases, a good model may show a small value, There fs no universal rule on how to incorporate the statistical measure in assessing a model. The context of the experiment or forecast is extremely important and, in different scenarios, the insights from the metric can vary. R2 can take any values between 0 to 1 (as itis denoted in percentage). * The formula for calculating R-squared is as following. Sum of squares due to regression total sum of squares ure of the var sure of th Re ‘+ The sum of squares due to regression measures how well the regression model represents the data that was used for modelling The total sum of squares ‘measures the variation in the observed data. Example + Assume that the regression line is given as the following function, + 7 =22+06 x Fora given set of data points x, R2 could be calculated for the regression line Y=2.2-+ 06 xas following, aa Jaca framerate Hp t]2 [2] 4 | azcopeters bid 214 Jo] o | z2soexzeae Loa oa sts|al +089 2200x3200 || 9 S}4 fo} o | c2sosxa-48 fae] oan SHS {i]t | cesooxsese fre) sy tl 4 rola Hence, Re ares Sum ofsquares due to feivesson 3 wil sumtaareen 38. Ano, optimisation Derivativerbased optimisation techniques, diforentiation to find out derivatives oy continuous function. The derivative is then useq i, find optimum values for the particular function. Ty, derivative information helps to determine yyy “direction of search for an optimal value, Atevery step, you carefully evaluate two things 4, Which direction to go and 2, How far to go in thatdirection ‘A few extra steps in the wrong direction could leag to injury, lost energy, ‘or backtracking. When yoy need to choose between a set of paths to go down, you usually pick-up the most promising path that is likely to get you down more quickly than other potential paths and provide maximum descent towards the downhill, ‘That Is the general concept behind derivative-based optimisation. You use derivative information to determine search direction and then descent in that direction, for a calculated distance, in the hope of ‘nding an optimal value. Generally (mathematically) speaking, consider « simple function y= 109 Derivative of the function is denoted as FQ) or dx ay Derivative f (x) gives the slope of f (x) at point x It specifies how a small change in input x affects the change iny or F(x), Now, consider another function such as. the following : frre) = FO) ser Oy You can reduce f (x) by ‘Moving x in small steps with “PPasite sign of derivative, You will use this mathematical concept when you lean about the Sctual derivative-based optimisation. techniques . For now, remember the core deme (earch direction and distance)’ behind derivative-based Optimisation. deasy cottons)‘G14 Explain tho characteristics of derivate ‘optimisation techniques, mute ane. Derivati following cl ‘character Optimisation Techniques iques have the 4, Mathematically bounded (no randomness) '5 Suttable for analytics 6. Does not require specific stopping criteria Fig. 3.4: Characteristics of Derivative-based Optimisation Techniques 1, Requires continuous function : Derivative-based optimisation techniques require a continuous function to work with. Functions that cannot be differentiated are not suitable. 2, Does not involve human Judgement : Derivative- based optimisation techniques do not involve human judgment to determine search direction for finding optima. Search direction is guided through derivatives of the continuous function. Fast : Derivative-based optimisation techniques are comparatively faster than _derivative-free optimisation techniques as they do not involve human judgement and random numbers. They are mathematically driven. Mathematically bounded (no randomness) : Derivative-free optimisation techniques, derivative- based optimisation techniques do not use random numbers to determine search direction. Derivative based optimisation techniques use function derivatives to determine search direction. : Suitable for analytics Derivative-based optimisation techniques can be used for carrying out analytics on optimisation approaches and computation time. Irrespective of the function, Yo" can analyse search performance parameters such as how much time did it take to find optima, where was optima found, and how many steps were Tequired to search optima. Does not require specific stopping criteria Derivative-free optimisation techniques, derivative: based optimisation techniques do not require specific stopping criteria, Derivative-based. optimisation techniques stop the optimisation search process after computing multiple rounds of derivatives as desired. _ hes ee '@.15 Write tho stops for Steepest Descent optimisation tochnique. (4 Marks) Ans. : ‘* Steepest Descent is one of the oldest optimisation techniques that uses gradient-based approach for finding optimal value for a function in a multidimensional space. It is also known as the gradient descent method. ‘The basic idea, is that you want to minimise a function f (x), where x is a vector (1, x2, x3, ~- xR) ‘that has elements for each feature value, starting from some initial guess x (0)- You try to find a sequence of new points x (i) that move downhill towards a minimal solution for F (x)- ‘The steepest descent method works for any number of dimensions. Therefore, you could take derivatives of the function in each of the different dimensions of xo find the optimal value for the function f(x). The whole set of functions can be written as Vf (x), i at ot which is a vector with gradient elements 3, 94.) = ror ‘each dimension of x. In derivative-based optimisation technique that you need to find out the search direction as well step size (or distance) to move in that direction. At any given point, there could be thousands of search directions possible. Hence, it is crucial to find out which search direction to take and how far should you descent in that direction to find the minimal value. So, f search direction is a line and you are at a point xk, then Meo = ¥Kt Rd [where ay is the step size and dk is the search direction] To find the search direction dk, you compute derivative Vf (xk) and move in the negative direction. Mass -corutions)ue Lanai USE w POMt Ng 1 ASHI Hence = = VE (adam the = eanth, ne a 2, there 1M semvh by Just repeatediy: mowinyt dows sear aieeetion until the minimal value ts The individual parts of the steepest descent method, write down the steps for it Take initial value of ¥ say” Np at pom k= 0. VE (mj) Md = 0 slmnplifies the Nin 2 Hyon 2 Determine seareh direction, d= (or less than a chosen low value), then stop. Cateutate a. = tach, Calculate step sire ay by minimizing F (xy, 1) with Tespect 0. S.Setyyer m= adh and k= k +1. Go to step 2, 216 Uso the Steepest Descent method to minimise 1 01. Me) = x} = xyxQ + 39 UP fo @ Point where the iterence between two successive iterations is less ‘han 0.2, Tako stating point as (1 3h. (6 Marks) Ans. : Reration 1 Now, substitute the initial value k= 1 for VF.Q0 taking the given values 1 2 Fespectively. Hence, (Starting point) at for x, xz as 1 and VEO) = (2x, ~xp, =m + 24g) VEO) = (21-4 1 +2xd}r 14205) 3 wren) = (2 0)r dy = - VFX) 3 is -(3o)r Now, according to stee Pest descent m next point X2 could be found as jethod, the % = X,-ayd 4X: the starting pointis given as (3 5 we (1, 3 -ax(2 or (-da.5) f FO) = the formula for (a = » =a? +b? ab) Z wick Reay rave values for point Xz. You can now £ (K,) by substituting the value of Xin the (0) © xh 2 xa)» (1-Fay Y-O-be) a” +3) (%) %) (12+ GanP-2*1 Fai) + Far Casing 3 £0) = (a+3a ak aang 2 a FO) = 14+ 2af-3a, sda} Now, to find the value of aj you differentiate f (X,) with respect to a, and minimise the derivative. 3 a 18,348 1g 9 4g To minimise the derivative, you can write . 189 4m 1 aay Substituting the value of a; you get X» as % = (13a, 3) x a 1 ay 2 a So, £(X,) = x2 =X Xt 9 Obed.) fQ,) = 2) Hence, using these values for x, and xz respectively you get 2 = 0.1875 %1 is given in the question as ( 1Machine Learning (SPPU) 3 FOG) = f=0.75 Now calculate the difference bet 0 F043) and f you et xa 1£00)- FOI = 0.1875 0.75 w as62s 0.5625 $0.2. Hence, you need to perform iteration eration 2 Now, you need to find dy and X, using Xp. nother ay =~ VFO%) (ody Now, according to steepest descent method, the next point 3 could be found as Xs = Xp-aydz oucdausted ¥ =(2,2)" i so = (44) -arx(0.3) 113 wel jay You now have values for point X3. You can now calculate f (X) by substituting the value of Xs in the given fiinction. FO) = x}-xam +0 1), (1), (1.3, rosa = (4)-(4)«G-45)* 1.2 ya eS £%3) = qg-g* 762471682" 4 9 2,3-3x4 1-2+4 10%) = qgat 16 27 16 22 By £Q%s) = 76427162 * 16 Now, to find the value of az you differentiate f (3) With respect to ay and minimize the derivative. ot 18, yuick Read, he derivative, you can write 9 To minimaiz 18, 1682 ° 1 2 Substituting the value of ay you get Xa as 14:8 y= (5-42 1 Xa (42-42 G5) $0, £(%a) = é ytd “ary £03) = £0) = rs Now calculate the difference between f (Xs) and f G), you get INE (X3) - F(X) I] = 0.047 - 0.1875 = 0.14 ij resessivee an Seton ence, = (EB? the opti saan 2 2 72+ XD £ Gc htt a QA7 Use the Stespest Descent method to minimise (x, #2) = 3x4 +5. Stop after two iterations. Use the Inial values as (4, 4)7. (6 Marks) VEX) = (6x1, 2x2)" Iteration 1 Now, substitute the initial value (starting point) at k= 1 for VE (X) taking the given values for x1, x2 as and 4 respectively. Hence, VE (Ki) = (6x1, 2x2)" VEX) = (6x4,2%4)T VE) = 24,8)" 4, = - VE0G) dy = (24,8)" Now, according to steepest descent methiod, the next point Xz could be found as. X= Xivardy X; the starting point Is given as (4, 4)" Xp = (4,4)7 ay (24,8)7 @ = (4- 24a, 4-Ba)T si mmny nthe calculate F (Xz) by substituting the value of Xp Mt piven function F(X) = 3x83 F(%) = 3(4~ 24ay)? «(4 Way)? £0) = 3 (42 (24a)? 20 ae 24a? (8m)? 264 Bay £QG) = 4841728aj ~ 5760, + 16 + 64ay~ 64a, £0) = 179283 - 6400, + 64 Now; t0 find the value of a you differentiate F (Xa) ‘ith respect to a; and minimise the derivative HE = 35840, ~640 ‘To minimise the derivative, you can write 3584a, ~ 640 = 0 0.179 Substituting the value of ay youget Xp as X= (4- 24a, 4~-8a,)™ Xp = (4-24 0.179, 4-80, 179)" X = (- 0.296, 2.568)7 So, £0%) " £0) = 3x (- 0.296)? + (2.568)? £02) = 026+659=6.85 F0%) = 3xp +3 £0G) = 3 (4)? « (492 £(%) = 48+16-64 Indeed the function f (X) is reducing as you moved from X; to Xp = 3x¢ £(%1) = 64 whereas f(X) = 6.85 Iteration 2 Xe = (- 0.296, 2.568)7 VE(K) = (6x, 2x,)7 VE(X) = (6x- 0.296, 2 x 2.568)T YF) = (1.78, 533)7 dy = cored 42 = - (1.78, 5.13)t Now, according to steepest de: xt point X, could be fou os ind as Xs = Xr~agdy method, the eration, Hy = (0296, 2.560)" FT e.296, 250107 =a (17,4 x an vy © (0290-170, 2.560 5.1.84,)7 : ovals for pomnt Xs. YoU ean y You now have valu et Yt reatane 1 (X4) by substituting the value OF Xy yyy" 3 (-0.296= 1.784)? + (2.568 ~ 5.1 3 (0.296)? + (1.7842)? 2 » (- 0.296) 4 1,7Ba,) + 2.568% + (5.19a,)* 2 « 2.565 | x 5.139 F(X%q) = 0.27+9.5a + 3.16a2 + 6.6 + 26.314; ~ 26.3442 £(%) = 35.81a3- 23.1822 + 6.87 Now, to find the value of az you differentiate f(x) with respect to az and minimise the derivative. ar day = 71.62ay- 23.18 ‘To minimise the derivative, you can write 71.62a2~ 23,18 = 0 az = 032 Substituting the value of a2 you get X3 as X3 = (+0296 - 1.78a,, 2.568 — 5.13a,)T = (0296 ~ 1.78 x 0.32,2,568-5.13 x 0.32)" X3 = (-0.87,0,93)t $0, £(X3)= & " 3x8 FQ) = 3 €0.07)2+ (0.9372 0%) = 2.274 0.862 3.13 ~ Bre + 2x + 2xqrg + xg wih 0 [ ol Using the Newton's method. (6 Marks)lachine Learning (SPU, ‘ans. vro0 ‘you get 4 and with respect to x, 4 yet to be filled ) V00 = ( 2yet to be filled Now, differentiate 2x + 2x, ~ 2 once with respect to | x, you get 2 and with respect to x2, Then, complete the vr ( : a matrixas V4OX)=[ 9 > Now, substitute the initial value (starting point) at k=1 for VF(X) taking the given values for x;, xp as and respectively. Iteration 1 txy4 20g +4 Honee, VR) = | # +2m- 4x0s2n0+4 ron) = (srocececs) woof 3] Now, according to Newton's method, the next point X, could be found as VE OG) ee MVEA) -O-G i}eG) ab Inverse ofa 22 matey] i » Jisevenas (i0] X ick Read i me (oll Al-L4) 0, F(X) = 4X1 = 2xy # 2X5 + Dera # £(%) = 4x3 2x44 DH (BeBe Sede a §(%) = - 12-84 18-244 16=-10 eration 2 axy4 2g +4 FEO) | ong 2x1 -2 Ax 342x404 VOD * a xaeax-3-2 ° VEO) =| 9 Now, according to Newton's method, the next point X3 could be found as FQ) Xo VFR) 0 OQ) IO) % = 4 Q.49 Minimise f (xy, x2) = x; ~ ime + 3x5 with starting point el ocala aed 2xy- m2 =X, + 6x Differentiate 2x; ~ xz once with respect to x, you get 2 and with respect to xz, you get - 1. 2yet tobe filled Vat (x) = ~ Lyet to be filled Now, differentiate - xy + 6x2 once with respect to x1, you get - 1 and with respect to xz, you-get 6. Then, 2st | -1 6 Now, substitute the initial value (starting point) at k = 1 for VF (X) taking the given values for x,, x, as and 2 (6 Marke) Ans. : VEX) = complete the matrix as V2f (X) { respectively. Hence, [ m-m mee Si eel [ 2x1-2 NERD = | aEan | be pecuecal ta] [: “GHLT iE “LLL }[6] Hence, X= [2] is the optimal solution for ia +303 2.20 Write @ short note on Overfiting. Ans. : * Definition : Overfiting occurs when the model ‘matches the training data too closely, causing it to Perform poorly on new data. (4 Marks) learned as concepts by the mode Overfitting is more likely with nonparametric and nonlinear models thy ; at have more flexibility when tearing a target function, AS such, many ich dotall che model feurny, para ng "are HON : not vat learn that ey exe sie Hoar at roar wventtt ing rating cat eo alee BY Dr + Ths praia road In ote 60 remove some ory ed up i Fe G21 Wie a ahort note on Undertiting, (way @.21 Wi Ane. Ans. Definition + Under"itting O€curs when the mogy can nelther model the traning data nor work yyy, the now date, cnualng Ito perform poorly eae irgot function Is kept too simple, it may pop tah learn tie’ estetial features ang cst of the training data. Undertting could potentially happen due to unavailability of sufficlent training data or incorrect selection features. ‘An underfit machine learning model is not a suitable model and it has a poor performance on the training data, Underfitting Is easy to detect given a good performance metric. Underfitting can also be avolded by using more training data and also by carefully selecting the features on which the model s trained, (Q. 22 With a diagram, ilustrate bias-variance trade-ot, (6 Marks) There is a trade-off (need to balance bias and variance) to get the models just right for the job. If you denote the variable you are trying to predict as Y and the dependent variables as %, then you may assume that there is a relationship such that wt 00 * € where the error term e ts nomall distributed, The error term is an ggregation of reducible ert0™ and irreducible error, . Total Error = Reducib The Fig. 35 illustra bias and variance, le Error + trreducible Error tes what it looks like to balantt Grrst soetend ar minimised. Mathematically, the error funetion could be described as following, Err(x) = Total Error = Bias? + variance + Irreducible Error «That third term, irreducible error, is the nofse term in the true relationship that cannot fundamentally be reduced by any model, Given the true model and infinite data to calibrate it, you should be able to reduce both the bias and variance terms to 0, However, in a world with imperfect models and finite data, there is a trade-off between minimising the bias and minimising the variance. To bulld a good model, you need to find a good balance between bias and variance such that it minimises the total error. ptm Seance i 1 ‘Total Error Algorithm Comply Fig. 3.5(a) Bias is high for a simpler model and decreases with an increase in model complexity, the line representing bias exponentially decreases as the ‘model complexity increases. Variance is high for a more complex model and is low for simpler models. Hence, the line representing variance increases exponentially as the model complexity increases. va bit iow variance Get balan ‘The most optimal complexity of the model is right m the middle, where the blas and variance intersect ‘These values of blas and variance would produce the least error and are preferred. An optimal balance of bias and variance would not overfit or tunderfit the model Use the flow chart shown in Fig. 3.5(b) for balancing bias and variance. Garvan ren ona Diana ie comple ext Bar wave anes ‘Recess Borne esti ae ‘oan nore dla etna ne of tates ‘leroee equa, ‘thew nade cere Fig. 3.5(b) ee eee Q.23 What are the characteristics of a high bias model? (4 Marks) ‘Ans. : Characteristics of a high bias model are as following. 1, Failure to capture proper data trends 2, Lowtraining accuracy 3, Potential towards underfitting 4. More generalised or overly simplified 5. High error rate Q.24 What are the characteristics of a high variance model? (4 Marks) ‘Ane. : Characteristics of a high variance model are as following 1, Noise in the data set 2. Low testing accuracy 3, Potential towards overfitting 4. Complex models 5,__ Trying to put all data points as close as possible 000 GSTMachine Learnt ‘a short note on SVM. (4 Marke) s at Write n + Definition : Support Machines (SYM) Is a ecto ( « : tical boundary Classification technique that 1 datapoints to create a the data points SVM is 4 supervised learning method. Plot each datapoint as a point in a n-dimensional space (where n is the number of datapoint attributes you have). The higher the gap between the datapoints (highest boundary datapoints of one class and lowest boundary datapoints of another class), the @ ‘Support vactors @) wo=nesa ) Fig. 4.1 Support vectors are datapoints that are closer to the hyperplane and influence the position and Grientation of the hyperplane. Using these support vectors, you maximise the margin of the classifier Deleting the support vectors will change the Position of the hyperplane. These are the points that help in building the SVM mod. 2 With a diagram, exp! maximum margin concept behind sv, (6 Marks) Ans. : The larger the margin between hyperplane the better, separable hyperplanes (whi boundary is given as Wx Any Positive Point above classifies the input as one the separating Typically, for linearly ich is a line), the decision ~b=0, the decision boundaryOF machine Learn e %o Pe de ee yt es © an ®eo : \ @. @@6@ @ @ @ 09 @ Tr, The region bounded by these two hyperplanes Is called the "margin", and the maximum-margin hyperplane is the hyperplane that lles halfway between them. With a normalised or standardised dataset, these hyperplanes can be described by the following equations, Hyperplane 1 is described as we x - b = 1 which dassifies the data into the first class. Hyperplane 2 is described as wx x~b classifies the data into the second class. = 1 which * Geometrically, the distance between these two hyperplanes(margin) is given as m = ro So, to maximise the distance between the planes you want to minimise ||wI|- wand xin the above equations are vectors. + SVM is a supervised learning method, x denotes input sample features such as x}, Xz, X3 »» Xn, W denotes the set of weights w) for each feature. b Is the bias that can be added to the hyperplane to shift (adjust) it towards a particular class as required. yi is the resultant classification based on the SVM hyperplanes. So, the datapoints must lie on the correct side of the margin for classifying them correctly, So, Ifwxx-b21,theny, = 1 (Positive classification) lw xx~bs~1, then y;=- 1 (Negative classification) __ To minimise ||w|| (the absolute value of w without Sign) subject to y, (w x x,-b) 2 1 for I= 1,2. que sad, “Trran me wavvarey dnt cats tpverptane, ABO every ttn asc typanehane a | Ae y a ow | oar | + | oon omy | oor one | owe | oat of on | ono o o16 | 0.86 o Ane. ‘The value of « is non-zero for only first two training ‘examples, Hence, the first two training examples are support vectors. Calculate the value of w and b as required to calculate the SVM hyperplanes. Letw = (wsw2) N we Z an¥a%n n= we = aytyr* Abx + a2" yo" Alxz = 65.52°1°0.38 + 65.52*-1°0.49 “72 Wa = ay "A2x +22" y2* Adz 65.52 *1* 0.47 + 65.52*- 1° 0.61 = -92 ‘The parameter b can be calculated for each support vector as follows. dw" x = 1-(-7.2)*0.38-(-9.2)*0.47 = 8.06 l-w*x, = 1-(-72)*047 -(-92)* = 10 Averaging by, bz you get b = 9.03 Hence, the hyperplane line is defined as = 7.2x4~ 9.2 x2 + 9.03 = 0 fodeasy solutions]Wey ‘ont eA lets [eta Tf -ve sampies E +e Samples 0808 og 1 Fig. 4.3 Now, suppose that there is a new data point (0.6, 0,9) that you want to classify. *0.9 +9.03=-3.57, 1 ~ 9.2 x2 + 9.03 you get - 7.2 * 0.6 - 9.2 [074.088 Putting the values in the equation - 7.2 x, Hence, this is classified as negative sample. 1 ook. T | ahsted New point to be classitieg 0.49087} fee fered Hale mi ae pee 03 D403 0.92,0.41 rt Cay on 0 Samples) SF 021,07 =: ; ° On 02 6. sv srae a8 Os O7 ——l 08 os‘ane. You cannot realy draw a fe Separatiny datapoints such that they can be “uplieds cat Me ci W Machine Learning (SPPU) ‘a dlagram, yuick Read. 34 Wiha dagram, explain Kernel tinction 217 (6 Marke) In such seenartos, kernel function to map the ‘plifted” to h PiiRed” to higher dimenstons and then could possibly be classified. Kernel Function Fig. 44 Fig. 4.4(0) + The kernel function is often denoted by (x). You can’ choose “the, type, “of kernel’ ‘umaction,’ ax, | Se Samm ® ON Oiler ne SYM i: MN) appropriate for problem in hand. Some of the | A"S-+ ‘commonly used kernel functions are polynomial Table 4.2 function, sigmoid function, and radial basis function. = — Comparison This technique of applying a mapping kernel || “pu iutg ius ea function to adjust the data is also called as kernel trick. Good for Linear Both linear and classification non-linear e Table 4.1 summarises various commonly used classification ernels. Decision _| Multiple ‘One (best one) Tobe 41 boundary Kernel Function ‘Approach __| Statistical Geometrical Linear K(x y)=(1+xTy) > Errors Comparatively | Comparatively | Polynomial K(x y)= (+ xT y)* higher: hawer Sigmoid K(x, y) = tanh y ~ 8) @.6 Calculate the radial distance of point A having value of 6 with respect to point B and C having value of 8 Radial Basis Function Selene: (4 Marks)Woe, Tryon plot these points, you Bet the layout ag. N WF Machine In Fig 45: Asonme y= Distance of point A with respect to vla ° be oH MOM? et o018 Distance of point A with respect to a Ce eM aeO Pee MeO abe point B has higher influence over point A than point ¢ @.7 Demonstrate how you coukl use REF over XOR function to draw a linear SVM. (8 Marke) Ane. : The truth table for XOR function is as following, Fig. 4.5 Tere are two classes here ¥ = (0, 1) and [SDE training samples, two for each class. [9 0 0 {0, 1} and (1, 0} produce 1 (+ ve class) where, [6 1 1 {0,0} and (1, 1} produce 0 (- ve class). Assume y = 1 {=a 0 1 tet's choose elements (0, 1} and (1, 0} asthe bas Pete aot Reanpwe) Basis = 0, 1} i) Basis = (1, 0} [ (0.0) foro? ¢100.0)-10.0)? -5-40-00-2 pte? = g 110.0) (.0)? 91-10-02 [en bw [war per Game ao 110.1) ~ (0.19? 5 @-1(0- 01-1)? ferits-¥)? = @-1((0,1)- (1,0)? © g-1(0-11-0)2 _ fe ON? = ge ALK, O}- (1, 09? _ g-1(1 - 10- oF [ay peo? nen giro ae e fe V9? = 1G 1)= (1,09)? - gC = 11-0} So, RBF does the following transformations to the respective Points. Old Point New Point {0, 0} {0.37, 0.37} {0, 1} {1, 0.018) (1, OF {0.018, 1} {1, 1} Litay | ‘fyou plot these points, you see that they are now cleaouiputy =r ia ave outpury =o lt XOR points to make ———_———— 8 Explain Radia! Basis Function Newonc” Ye moagy oe nation Network. (@ Marka) + The core concept behind RBF is close together should gener points (0.37, 0.37) » REF try ly separabl sformed the that inputs that are ate the same out output + Points that are close together exercise im influence over each other than the ‘tana farapart points that are + Neural networks, for any input that you present toa set of the neurons, some of them will fre strongly, some weakly, and some will not fire at all, depending upon the distance between the weights and the particular input in weight space. + This simply requires adding weights from each hidden (RBF) neuron to a set of output nodes, This is known as an RBF network. BF Layer Fig. 4.6 Input Layer Output Layer The Radial Basis Function network consists of input nodes connected by weights to a set of RBF neurons, which fire proportionally to the distance between the input and the neuron in weight space. ‘The activations of these nodes are used as inputs to the second layer, which consists of linear nodes. RBF networks never have more than one layer of non-linear neurons, In an RBF network, input nodes will activate according to how close they are to the input, and the combination of these activations will enable the network to decide how to respond. 8 Write a short note on Support Vector Regression (svR). (omen Support Vector Machines (SYM) are popularly and widely used for classification problems in machine learning, Support Vector Regression (SVR) uses the same principle as SVM, but for regression problems. ‘The goal of a regeession problem is to find a function that approximates mapping from an input domain to real numbers on the basis of a training sample. Decision Boundary Decision Boundary Hyporpiane Fig. 4.7 In case of SVM, consider the two decision boundaries and a hyperplane. Your objective is to consider the points that are within the decision boundary line. The best fit line (or regression line) Is the hyperplane, that has a maximum number of points. Assume that the decision boundaries are at any distance, say ‘a’, from the hyperplane. So, these are the lines that you draw at distance ‘+a’ and ‘a’ from the hyperplane. Based on SVM, the equation of the hyperplane ts as following. Y =wx-b, The equations of decision boundaries are as following. wx-b =a wx-b -a Thus, any hyperplane that satisfies SVR should satisfy -aSwx-bsa @.10 Write a short note on multiclass classification techniques. (4 Marks) Ans. : Definition: Multi-class_ or multinomial classifiéation is the problem of classifying instances (data points) into one of three or more classes. Multi-class classification is a task with more than two classes. @s‘Mutts lnstiention Fig. 48 + For example, classifying a set of images of fruits which may be oranges, apples, or pears. Multiclass classification makes the assumption that each sample is assigned to one and only one: class (orlabel. + Forexample,a fruit can either be an apple or a pear, but mot both at the same time, Another example could be recognising alphabets in an optical character recognition type of problems where a siven alphabet could be one of the 26 alphabets. + Multi-class classification techniques are classified 1. One vs One (Ovo) 2 Onevs Rest (OvR) (Ove vs All) 2.11 Explain One vs One (OvO) multiclass classification technique. (4 Marks) Ans. One vs One (OVO in short) is a heuristic method for using binary classification algorithms for multi-class classification. One vs One technique splits. a multi-class classification dataset into binary classification problems. {mn this approach, the entire dataset is split into’ one dataset for each class versus every other class. For example, consider a multi-class classification Problem with four classes ~ ‘red’, ‘blue’, ‘green’, and ‘Yellow’. This could then be divided into six binary «lassfication datasets as following. * Binary Classification Problem 1: red vs blue Binary Classification Problem 2: red vs green Binary Classification Problem 3; red vs yellow Binary Classification Problem 4: blue vs, green Binary Cassfcaton Problem S: blue vs yellow Bi Naty Classification Problem 6 green vp yellow Ne isthe number of classes / ach binary classification model may predict ong ass labol and the model with the most predictions oy votes is predicted, Similarly, ifthe binary classificatig, models predict a numerical class membership, such as, probability, then the argmax of the sum of the scores predicted asthe lass label G17 Explain One ve Rect (Ov) mult-clase cassticaie technique o“ ani One vs Rest (OvR in short, also referred to a5 One. vs-All or OvA) is a heuristic method for using binary classification algorithms =— for _—_multi-class classification. + It involves splitting the multi-class dataset inty multiple binary classification problems. + Each binary classifier is then trained on each binary classification problem and predictions are made using the model that is the most confident. ‘* Basically, for each split dataset, you take one class as Positive and all other classes as negative, * For example, consider a multi-class classification Problem with four classes - ‘red’, ‘blue’, ‘green’, and ‘yellow’. This could be divided into four binary classification datasets as following, © Binary Classification Problem 1: red vs [blue, green, yellow] © Binary Classification Problem 2: blue vs {red, green, yellow] © Binary Classification Problem 3: green vs {red, blue, yellow) © Binary Classification Problem 4: yellow vs [red, blue, green] * One machine learning model is created for each class. For example, four classes require four models. This approach requires that each model predicts a class membership probabil ity or a probability-like Score. The argmax of these scores (class index with. the largest score) is then used to predict a class. This approach ts commonly used for algorithms that naturally predict numerical class membership Probability or score, such as Logistic Regression and Perceptron, @s EM{@ 12 Compare One vs One (Ovo © stachine Leaning TT One advantage on this appre ivorretabiltys Since each eta rt ts its one and only one classitier (ig yee OY S possible to ai AbOUE ThE lass by ene ain conrespoonitys classifier. This is the my used strategy andl is a fie Annonwtedige: inspecting its ost commonly: dotautt choice, u. ‘nt One vs Rost (Ov ‘lt class classification techni {ovr ae (4 Marks Ane — » | comparison Ea | attribute ee ae Faster than O¥0 computation | Complexicy Low | suitable for Algorithms that | Algorithms that don't scale seale No.of binary | datasets or | models for ¢ « | dasses | interpretability | Low High Used Less commonly ‘| More Commonly Q.13 What are the causes of class imbalance? (4 Marks) Ans. The imbalance in the class distribution may have many causes. The two main causes are as following. Sampling error You might have class imbalance due to sampling errors. For example, you could have collected the datapoints from a very narrowly selected population or there could be bias in sampling along ick Read with data collection errors (such as. incorrectly abetting samples) 2. Problem domain Rased on the domain of your classification problem, there might be natural class imbalance. Suppose, that you are building a model based on weather conditions to predict next earthquake. Now, earthquake events may be very rare whereas non- earthquake events may dominate the majority of weather condition related datapoints, You cannot really do anything about it (natural events, such as an earthquake, are beyond your control) and your overall earthquake dataset might be imbalanced. © The natural occurrence or presence of one class may dominate other classes. This may be because the process that generates observations in one class is more expensive in time, cost, computation, or other resources. As such, it is often infeasible to simply collect more samples from the domain in order to improve the class distribution. Q. 14 Describe a few techniques to handle class imbalance” (6 Marks) OR Witea short note on Tomek Links. (4 Marks) OR Wite« short note on SMOTE. (4 Marts) Ans. : A few techniques to handle class imbalance. 1, Random Resampling : Resampling is a widely used technique for handling imbalanced classes in a dataset. It consists of removing samples from the majority class (under-sampling) and/or adding more samples (artificially) from the minority class (over-sampling). The Fig. 4.9 illustrates this concept. i Oversampling Copies of the ninonity ass, Samples of, majority cass Ea = Original dataset Original dataset Fi Fig. 49 In under-sampling, you remove mal and minority classes are balanced lass data. ny observations of the majority class randomly. This is done until the majority ‘out. Under-sampling can be @ good choice when you have a lot of majority arnlek jon which might have been users ‘ver-sampling can be a good « re thaiecan cause overt Machine Learning (SPU me informat ‘move ini + A drawback of under-sampling Is that you randomly "ements o, In over-sampling, you add more copes of data potats to the mY when you do not have alot of data to work with. drawback 0 poor generalisation put of opposite classes. Removing the instang stances but of cilitati 2 Tomek Links : Tomek links are pairs of vars acm the two classes and ths feiinting the dasieaiy the majority class of each pair increases the space betwee st neighbours of each other. It is an y jority p vvples are the nearest neighbou semen: Ws 20 Unde. process better. Tomek links exist ifthe two sampl ig. 4.9(a) illustrates how Tomek links won class. The ‘sampling technique ta remove datapoints of the majority class: 000 0 Tomekioks 9 OO 0 Fig. 4.9(a) 3. SMOTE (Synthetic Minority Oversampling Technique) : SMOTE generates synthetic (artificial) data for the minority class. Itis an over-sampling technique. It consists of synthesising elements for the minority class, baseg on those that already exist. It works by randomly picking a point from the minority class and computing the k. nearest neighbours for this point. The synthetic points are added between the chosen point and its neighbours, ‘The Fig, 4.9(b) illustrates how SMOTE works, Symes a a samples Soe ° e *na a ©. oe ge eS its ee e%o 7 ° eg ee ‘ . ° ee minority class. 15 Explain the concept of Bagging. Ans. : “ * Definition : Bagging is an ensemble technique designed to i Sigoritbmnd Hered to improve the stability and accuracy of classification * It isa short form for Bootstrap aggregating. It reduces bi: 3 as a bootstrap method (bagging) works i s following, variance from the predictive models. The "i! * Asample of 100 values (say distribution d), and ), ‘you.would |j is ‘ould simply calculate the mean directly from the sample ae e*0 tan estimate ofthe mean of the sample. Yo" ‘Sum of values ind Mean (q) = Stmotvaluesind eeerepresentation of the distribution, "You teat improve the estimate of the mean using. aha bootstrap procedure by ame 1. Creating many (say 1000) random subs your original dataset. You may selec dent ‘ay select a datapoi various sub-datasets multiple times, = 2, Calculating the mean of each sub-dataset values. Since the sample size a Calculating the average of all the m 3 ie means for sub- datasets {orien TO [email protected] Write a short note on Stumping. (@ Marks) Ans. + There is a very extreme form of boosting that is applied to trees. In real world, the stump of a tree is the tiny piece that is left over when you chop off the rest of the tree. + In machine learning, stumping consists of simply taking the root of the tree and using that as the decision maker. So, for each classifier, use the very first question that makes up the root of the tree, and that is it The decision tree is not further grown. Its also called asa decision stump. + Definition : A decision stump is a machine learning model consisting of a one-level decision tree. + Itisa decision tree with one internal node (the root) which is immediately connected to the terminal nodes (its leaves). A decision stump makes a prediction based on the value of just a single input feature. «Following is an example of a decision stump. Fig. 4.10 of stumping can be very round @ * The overall performance high if the dataset is majorly skewed a Particular feature. Machine Learning (SPPU) a Your samples have very low E : .17 Explain AdaBoost algorithm. Ane.: © AdaBloost is the short name used for adaptive boosting. This boosting method is adaptive in the sense that it uses a concept of weights to train the dlassifier model and adjusts these welghts dynamically based on how the model performs in subsequent iterations of training, * Ata high-level, the AdaBoost algorithm works as following, . 1. A weight is applied to every example in the training data, Initially, these weights are all equal. 2. Aweak classifier is first trained on the training data. ‘The errors from the weak classifier are calculated, and the weak classifier is trained a second time wit the same dataset. ‘The second time the weak classifier is trained, the ‘weights of the training set are adjusted such that the examples properly classified the first time are weighted less and the examples incorrectly Classified in the first iteration are weighted more. 4, To get uniform output from all of these weak Classifiers, AdaBoost assigns 0. values to each of the classifiers. The a values are based on the error of each weak classifier. The error € is given as number of incrorrectly classified examples total number of examples Gis calculated as tn(t a= Zw(te 5. After you calculate «, you can update the weight vector D so that the examples that are correctly classified will decrease in weight and the misclassified examples will increase in weight. If the classification was correct, then D is calculated as e= wee = oss 7) Else if the classification was incorrect, then D is calculated as 1 (ge 2 Ont a) 6. After D is calculated, AdaBoost starts on the next iteration. The AdaBoost algorithm repeats the training and weight-adjusting iterations until the training error is Oor until the number of weak classifiers reaches a user-defined value. @s pe? Dindom Forasts, O10. Wilt a anort note on Far me Forests Is a rem egmition + wandom Forests 1 AM enemy, maque designed £0 cOMDINC EVETAL Joy” or ta reduce errors and t0 Dulld @ more ace, treos to reduc cura, prediction no instead of bullding one dees, one are built. multiple decision tre nach wee In the forest 15 built using rang, datapoints drawn from the dataset, E}e ‘the trees in the forest are split SUCH that a fy subset of Input varlables (attributes) are taken ex, time and the best possible split Is performed, fy, Fig. 421

MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
21csc305p ML Unit 2
No ratings yet
21csc305p ML Unit 2
115 pages
ML Unit-2
No ratings yet
ML Unit-2
34 pages
FML Unit2
No ratings yet
FML Unit2
13 pages
Machine Learning Unit2
No ratings yet
Machine Learning Unit2
31 pages
Linear Regression-Part 2
No ratings yet
Linear Regression-Part 2
26 pages
4-Curve Fitting and Interpolation
No ratings yet
4-Curve Fitting and Interpolation
48 pages
Unit 2
No ratings yet
Unit 2
92 pages
Lecture+Notes+-+Advanced+Regression
No ratings yet
Lecture+Notes+-+Advanced+Regression
12 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Linear Regression
No ratings yet
Linear Regression
49 pages
Regression
No ratings yet
Regression
6 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
Dr. Siti Mariam Binti Abdul Rahman Faculty of Mechanical Engineering Office: T1-A14-01C E-Mail: Mariam4528@salam - Uitm.edu - My
No ratings yet
Dr. Siti Mariam Binti Abdul Rahman Faculty of Mechanical Engineering Office: T1-A14-01C E-Mail: Mariam4528@salam - Uitm.edu - My
30 pages
TCMG - MEEG 573 - SP - 20 - Lecture - 7
No ratings yet
TCMG - MEEG 573 - SP - 20 - Lecture - 7
69 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Chapter - 2 - Linear and Logistic Regression
No ratings yet
Chapter - 2 - Linear and Logistic Regression
34 pages
Linear Regression Algorithm
No ratings yet
Linear Regression Algorithm
16 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
2 Linear
No ratings yet
2 Linear
15 pages
Numerical Methods With Applications
No ratings yet
Numerical Methods With Applications
34 pages
Chapter2 Annotated Part2
No ratings yet
Chapter2 Annotated Part2
30 pages
Chapter4 Regression
No ratings yet
Chapter4 Regression
15 pages
Topic 8 - Regression Analysis
No ratings yet
Topic 8 - Regression Analysis
51 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
Linear Regression
No ratings yet
Linear Regression
34 pages
5.linear Regression
No ratings yet
5.linear Regression
39 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
Daunit 3
No ratings yet
Daunit 3
32 pages
Karthik Nambiar 60009220193
No ratings yet
Karthik Nambiar 60009220193
9 pages
Chapter2 - Optimisation
No ratings yet
Chapter2 - Optimisation
7 pages
Regression v33
No ratings yet
Regression v33
81 pages
Numerical Computation - 7 - Linear Regression
No ratings yet
Numerical Computation - 7 - Linear Regression
27 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
CSL0777 L12
No ratings yet
CSL0777 L12
18 pages
ML Unit
No ratings yet
ML Unit
23 pages
Lec 3
No ratings yet
Lec 3
20 pages
NOTES - UNIT 2 - Machine Learning
No ratings yet
NOTES - UNIT 2 - Machine Learning
33 pages
Lecture 3 Multi-Regresion 2022.
No ratings yet
Lecture 3 Multi-Regresion 2022.
16 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Data Analytics Unit 3 Notes
100% (3)
Data Analytics Unit 3 Notes
28 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
4 - 1 - Numerical Analysis - Function Approximation - Linear Regression
No ratings yet
4 - 1 - Numerical Analysis - Function Approximation - Linear Regression
20 pages
5 - AML Lecture 5 - Linear Regression
No ratings yet
5 - AML Lecture 5 - Linear Regression
56 pages
ML - Lec 4-Introduction To Regression
No ratings yet
ML - Lec 4-Introduction To Regression
65 pages
Linear Regression
No ratings yet
Linear Regression
83 pages
Da Unit-3
No ratings yet
Da Unit-3
27 pages
MachineLearning Unit-II
No ratings yet
MachineLearning Unit-II
45 pages
Exampleofregressions
No ratings yet
Exampleofregressions
21 pages
3 Da
No ratings yet
3 Da
16 pages
2 Simple Linear Regression
No ratings yet
2 Simple Linear Regression
22 pages
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
No ratings yet
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
19 pages
Linear Regression
No ratings yet
Linear Regression
8 pages
Basic Interview Question of Linear Regression
No ratings yet
Basic Interview Question of Linear Regression
9 pages

ML EasySol

Uploaded by

ML EasySol

Uploaded by

You might also like