IOML Ch-5

Uploaded by

Gaurav Kamath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

41 views11 pages

IOML Ch-5

Uploaded by

Gaurav Kamath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 11

Logistic Regression and Support Vector Machine ‘Syllabus tic Regression, Introduction to Support Vector Machine, The Dual Formation, Maximum vein with Noise, Nonlinear SVM and Kernel Function, SVM: Solution tothe Dual Problem. Contents 51 Logistic Regression 52 Introduction to Support Vector Machine 53 Kemel Methods for Non - linearity ‘Scanned with CamScanner‘NG istic Regression and Su; \ Introduction to Machine Leaming 5:2 int PPO Veet | Sty Ea Logistic Regression * Logistic regression is a form of regression analysis in which the Sutcome is binary dichotomous. A statistical method used to model di Vat binary or did chatg- *, binary outcomes using predictor variables. ‘ Logistic component : Instead of modeling the outcome, Y, directly, the . models the log odds (¥) using the logistic function. Regression component : Methods used to quantify associa outcome and predictor variables. It could be used to build function of predictors. nt tion betw, ict een Predictive m ® ‘Odels a, In simple logistic regression, logistic regression with 1 Predictor variable. Pa) ) i + ayy | = Bo +BiXy +BaX2 +... +B,Xy Y = Bo+BiXy +B.X2 +...4 BX, +e "th logistic regression, the response variable characteristic, that is, a 0 * Wi is an indicato /1 variable. Logistic - regressi a Logistic Fig. 5.4.4 * If analysis of cova , lite regression can bar ee Sid © Be test adjusted for other variables, ti Proparions sha eget of a a chee er henge regression is 2 o/t r other Variables. While the res or homogeneity equation, does nop STH the logistic Ponse variable in a logit °F Predict the 0/1 vatiable ™ equation, which is a line!) Tegressioy lf. a Introduction to Support | itself, Vector Machine Support Vector Machines (SVMs) are a . learn from the dataset and used fc. claset cl learning, methods wi statistical learning theory by Vapnik ana ChervoneoeM is a classifier derived TECHNICAL PUBLICATIONS® ~ _ 2 ‘Scanned with CamScannerind Loar 3 vs to Machine Leaming 5-3 Logistic Regression and ‘Support Vector Machir ine ‘An SVM is a kind of large-margin classifier: it is a ve . ct ir classes that is maximally far from any point in the training data unm Given a set of training examples, each marked as belon f B iging to one of tw ‘an SVM algorithm builds @ model that predicts whether a new ae fale nee one class or the other. Simply speaking, we can think of an SVM model as representing the examples as points in space, mapped so that each of the examples of the separate classes are divided by a gap that is as wide as possible. ° New examples are then mapped into the same space and classified the class based on which side of the gap they fall on. to belong to two Class Problems + Many decision boundaries can separate these two classes. Which one should we choose ? Perceptron learning rule can be used to find any decision boundary between class 1 and class 2. oe Class 1 Fig. 5.2.1 Two class problem The line that maximizes the minimum margin is a good bet. The model class of “hyper-planes with a margin of m" has a low VC dimension if m is big. This maximum-margin separator is determined by a subset of the data points. Data points in this subset are called “support vectors’. It will be useful computationally if only a small fraction of the data points are support vectors, because we use the support vectors to decide which side of the separator a test case is on. Example of Bad Decision Boundaries * SVM are primarily two - class classifiers with the distinct characteristic that they aim to find the optimal hyperplane such that the expected generalization erro! minimized. Instead of directly minimizing the empirical risk calculated from 1 training data, SVMs perform structural risk minimization to achieve & Beneralization. TECHNICAL PUBLICATIONS® - an up-thust for knowledg® ‘Scanned with CamScannerLogistic Regression and Support Vectoy, Mag Introduction to Machine Leaming SS om Class 1 Fig. 5.2.2 Bad decision boundary of SVM * The empirical risk is the average loss of an estimator for a finite set of data rayy from P. The idea of risk minimization is not only measure the performance of estimator by its risk, but to actually search for the estimator that minimizes rik over distribution P. Because we don't know distribution P we instead minimize empirical risk over a training dataset drawn from P. This general Teaming technique is called empirical risk minimization. * Fig. 523 shows empirical risk. High 4, \ \ : Short ‘Small Large Complexity of function set lie ver consistent but is more ‘likens °8° {© the boundary, the classifier may "likely" to make en istributi rors i te distribution. Hence, we prefer classifiers that enjen TeN instances from : data points to the separator. maximize the minimal distance 1. Margin (m) : The margin is the minimon ret” &22 points and the classifier boundary. ™ this hyperplane is in the canonical fan? sample to the decision boundat f vse margin can be measured bY TECHNICAL PUBLICATION<® — ‘Scanned with CamScanner5-5 wt eaming Logistic Regression and Support Vector Machine © denotes +1 © denotes ~1 Fig. 5.2.4 Good decision boundary length of the weight vector. The margin is given by the projection of the distance between these two points on the direction perpendicular to the hyperplane. Margin of the separator is the distance between support vectors. . 2 Margin ©) - Try] 2. Maximal margin classifier : A classifier in the family F that maximizes the margin. Maximizing the margin is good according to intuition and PAC theory. Implies that only support vectors matter; other training examples are ignorable. For the folowing figure find a linear hypenplane (decision boundary) tha wi separate the data. pies rear ' Fig, 5.2.5 TECHNICAL PUBLICATIONS® - an up-thrust for knowledge ‘Scanned with CamScanner6 Logistic Regression and Support Vector, Me 5 Introduction to Machine Learning Solution : 7, Second possible solution 7. One possible solution ——— 5 ° OG P 6 «= %\ ° 3. Other possible solution By Bh 5. How do you define better? 6. Find a hyperplane that maximizes selaiins B1 is better than B2 8, Wex+b=0: el — Wwextb=+1 Wextb=—1) bay Fig. 5.2.6 1. Define what an optimal hyperplane is : maximize 2. Extend the above definition for non term for misclassifications. margin. ~ linearly separable problems : have a penalty Pace where it is easier to classify with lines! decision surfaces : reformulate problem so that data is mapped implicitly to this space. EEEZI Key Properties of Support Vector Machines 1. Use a single hyperplane which subdivides the 7 ne Space into two half - spaces, © which is occupied by Class 1 and the other by C! 2 ‘lass 2 TECHNICAL PUBLICATIONS® - an up-thrust for knowledge ‘Scanned with CamScannerino Learning 5-7 tp Machine Learning Logistic Regression and Support Vector Machi fachine maximize the margin of the decision boundary usin 2 They hes which find the optimal hyperplane, 8 quadratic optimization ality *0 handle large feature spaces. 5 veins can be controlled by soft margin approach. 4 an used in practice, SVM approaches frequently map the examples to a higher =e onal SPACE and find margin maximal hyperplanes in the mapped space cotaining, decision boundaries which are not hyperplanes in the original space. : « tre mast popular versions of SVMs use non - linear kernel functions and map the attribute space into a higher dimensional space to facilitate finding "good" linear decsion boundaries in the modified space. i522 SVM Applications + SVM has been used successfully in many real - world problems, 1. Text (and hypertext) categorization 2. Image classification 3, Bioinformatics (Protein classification, Cancer classification) 4. Hand-written character recognition 5. Determination of SPAM email. FE} Limitations of SVM L Itis sensitive to noise. 2 The biggest limitation of SVM lies in the choice of the kernel. 3. Another limitation is speed and size. 4 The optimal design for multiclass SVM classifiers is also a research area. EY soft margin sv sification, sometimes are not, and even if the bulk of the data 2 the very high dimensional problems common in text clas data are linearly separable. But in the general case they wi yi we might prefer a solution that better separates ignoring a few weird noise documents. * What . tt * if the training set is not linearly separable ? Slack parables can ae fo . Misclassification of difficult or noisy examples, resulting margin ca the ov Soft - a : in or % ~ margin allows a few variables to cross into the marB , allowing misclassification. ee eet age TECHNICAL PLAN ICATIONS® - an up-thrust for Knowedd ‘Scanned with CamScannerIntroduction to Machine Leamming 5-8 Logistic Regression and Support Vector May, © We penalize the crossover by looking at the number and distance o, misclassifications, This is a trade off between the hyperplane violations ang margin size, The slack variables are bounded by some set cost. The farther the are from the soft margin, the less influence they have on the prediction, they * All observations have an associated slack variable 1. Slack variable = 0 then all points on the margin. 2. Slack variable > 0 then a point in the margin or on the wrong side of y, hyperplane 3. Cis the tradeoff between the slack variable penalty and the margin. [EEE] comparison of SVM and Neural Networks ‘Support Vector Machine Neural Network Hidden Layers map to lower dimensional CBRE? From the following diagram, identify which data points (1, 2, 3, 4, 5) ar Support vectors (if any), slack variables on correct side of classifier (if any) and slack variables on wrong side of classifier (if any). Mention which Point will have maximum ) enalty and why ? vas Peay Fig. 5.2.7 TECHNICAL PUBLICATIONS® - an up-thrist for knowledge ‘Scanned with CamScannerj ye Learning 5-9 ir } ee ee Vector Machin wt points 1 and 5 will have maximum penalty. + in (m) is the gap between data points & the classifier bounda M ry. Mae rinimam distance of any sample to the decision bo Y. The margin uundary, rane is in the canonical form, the margin can be measured iz fe ia is tp eit vector. Maximal margin classifier : A classifier in the family F that maximizes the margin. vmizing the margin 38 good according to intultion and PAC theory implies saly support vectors mater; oer training examples ae ignorable : _ iat if the traning set is nat linearly separable ? Slack variables canbe added to wey misclssifcation of dificult or noisy examples, resulting margin called sof |g sofemargin allows a few variables to cross into the margin or over the rypaplae, allowing misclassification. 4 ve penalize the crossover by looking at the number and distance of the ‘redassifications. This is a trade off between the hyperplane violations and the muagin size. The slack variables are bounded by some set cost. The father they wre from the soft margin, the less influence they have on the prediction. « All observations have an associated slack variable, 1. Slack variable = 0 then all points on the margin. 2 Slack variable > 0 then a point in the margin or on the wrong side of the hyperplane. 3. Cis the tradeoff between the slack variable penalty and the margin. 153 Kernel Methods for Non - linearity * Kemel methods refer to a family of widely used nonlinear algorithms for machine leaming tasks like classification, regression, and feature extraction. * Any non-linear problem (classification, regression) in the original input space can be converted into linear by making non-linear mapping into a feature space with higher dimension and shown in Fig. 53.1. * Often we want to capture nonlinear patterns in the data. ; ; Nonlinear Regression : Input - output relationship may not be linear + Nonlinear Classification : Classes may not be separable by @ linear boundary. €mels : Make linear models work in nonlinear settings. oa new space where the original Kemels, using a feature mapping ¢, map data t ming problem becomes easy. TECHNICAL PUBLICATIONS® - en upthust for nowied?® ‘Scanned with CamScannerLogistic Regression and Support Vector, < 5-10 Introduction to Machine Leaming ett reo Feature space Fig. 5.3.1 Mapping of space * Consider two data points x = (%; % } and z = {z4; Z, }. Suppose we roy function k which takes as inputs x and z and computes. K(x, z) = (xTz)? = (xiz1 + X2%2)" = xdebt x3 23 + 2xqx22122 = (12, V2 xyx2/x3)? (22, V2 2422 23) = 6(x)"6(z) (an inner product) Features Examples | Kemel function orga ears matic ema mai Fig. 5.3.2 * The above k implicitly defines a mapping @ to a higher dimensional space 90%) = Ix}, V2xqxpx3) * We didn't need to pre - define/compute the mapping @ to compute K(x, 2) * The function k is known as the kernel function. K:NXN matrix of pairwise similarities between examples in F space. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge ‘Scanned with CamScannerYD ees Machine Leaming Logistic Regression and Support Vector Machine pavantages oa The kernel defines a similarity measure between two data points and thus allows "ie to incorporate prior knowledge of the problem domain. + Most importantly, the Kernel contains all of the information about the relative yevaitions of the inputs in the feature space and the actual learning algorithm is pased only on the kernel function and can thus be carried out without explicit use of the feature space. 43. The number of operations required is not necessarily proportional to the number of features. Q00 ~ ‘Scanned with CamScanner

Unit 2 PPT - Part 2
100% (1)
Unit 2 PPT - Part 2
81 pages
Support Vector Machine Master Thesis
100% (3)
Support Vector Machine Master Thesis
7 pages
SVM7
No ratings yet
SVM7
53 pages
ML Unit 4
No ratings yet
ML Unit 4
25 pages
Support Vector Machinephd Thesis
100% (2)
Support Vector Machinephd Thesis
6 pages
2425S Csec520 07 SVM
No ratings yet
2425S Csec520 07 SVM
50 pages
10 Classification SVM
No ratings yet
10 Classification SVM
22 pages
ML Chapter 5 Part 2
No ratings yet
ML Chapter 5 Part 2
52 pages
Lecture 14
No ratings yet
Lecture 14
20 pages
Support Vector Machine
No ratings yet
Support Vector Machine
9 pages
Machine Learning Answer Bank
No ratings yet
Machine Learning Answer Bank
54 pages
What Is A Support Vector Machine
No ratings yet
What Is A Support Vector Machine
3 pages
Support Vector Machine
No ratings yet
Support Vector Machine
21 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
103 pages
HandsOnML Ch5E
No ratings yet
HandsOnML Ch5E
31 pages
CMPE 442 Introduction To Machine Learning: Support Vector Machines
No ratings yet
CMPE 442 Introduction To Machine Learning: Support Vector Machines
64 pages
Ankita
No ratings yet
Ankita
10 pages
Unit-III - SVM
No ratings yet
Unit-III - SVM
105 pages
Support Vector Machine
No ratings yet
Support Vector Machine
33 pages
Support Vector Machine: Abinas Panda
No ratings yet
Support Vector Machine: Abinas Panda
52 pages
Support Vector Machines Vs Logistic Regression: Kevin Swersky University of Toronto CSC2515 Tutorial
No ratings yet
Support Vector Machines Vs Logistic Regression: Kevin Swersky University of Toronto CSC2515 Tutorial
23 pages
IVPML Unit III
No ratings yet
IVPML Unit III
139 pages
3.unit 3 ML Part-2 Q&A
No ratings yet
3.unit 3 ML Part-2 Q&A
23 pages
Unit - 2-1
No ratings yet
Unit - 2-1
7 pages
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
43 pages
Unit 2 - SVM - 241016 - 104220
No ratings yet
Unit 2 - SVM - 241016 - 104220
47 pages
Support Vector Machine
No ratings yet
Support Vector Machine
9 pages
Unit II 2.2 ML Kernel Machines SVM
No ratings yet
Unit II 2.2 ML Kernel Machines SVM
50 pages
Chapter 07
No ratings yet
Chapter 07
18 pages
Support Vector Machine: Prof. Subodh Kumar Mohanty
No ratings yet
Support Vector Machine: Prof. Subodh Kumar Mohanty
52 pages
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
No ratings yet
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
31 pages
Support Vector Machine
No ratings yet
Support Vector Machine
8 pages
Support Vector Machine
No ratings yet
Support Vector Machine
29 pages
Support Vector Machine
No ratings yet
Support Vector Machine
17 pages
Exp 14
No ratings yet
Exp 14
27 pages
Classification Regression: Mostly Used in Classification Problems
No ratings yet
Classification Regression: Mostly Used in Classification Problems
8 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
SVM
No ratings yet
SVM
11 pages
Support Vector Machines (SVM) Models in Stata
No ratings yet
Support Vector Machines (SVM) Models in Stata
19 pages
Ann Unit III
No ratings yet
Ann Unit III
20 pages
Unit - 2
No ratings yet
Unit - 2
15 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
Support Vector Machine
No ratings yet
Support Vector Machine
13 pages
Support Vector Machine (SVM) Terminology Hyperplane WX + B 0 Support Vectors Margin Kernel Hard Margin Soft Margin
No ratings yet
Support Vector Machine (SVM) Terminology Hyperplane WX + B 0 Support Vectors Margin Kernel Hard Margin Soft Margin
6 pages
Unit 2
No ratings yet
Unit 2
47 pages
Support Vector Machines
No ratings yet
Support Vector Machines
33 pages
Ioml Index
No ratings yet
Ioml Index
6 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
IOT Ch-1
No ratings yet
IOT Ch-1
68 pages
Module10 - Support Vector Machine
No ratings yet
Module10 - Support Vector Machine
23 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Linear Regression & SVM
No ratings yet
Linear Regression & SVM
33 pages
SVM Notes Unit 4
No ratings yet
SVM Notes Unit 4
8 pages
Support Vector Machine
No ratings yet
Support Vector Machine
31 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
4 pages
Winter
No ratings yet
Winter
1 page
10 SVM
No ratings yet
10 SVM
23 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
IOT Index
No ratings yet
IOT Index
9 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages

IOML Ch-5

Uploaded by

IOML Ch-5

Uploaded by

You might also like