0% found this document useful (0 votes)

6 views

Lab 5

Uploaded by

k213928

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Lab 5

Uploaded by

k213928

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Lab-5 Manual Machine Learning Models for Business Analytics

National University of Computer &

Emerging Sciences, Karachi

Fall 2024, Lab Manual – 5

Supervised Learning - Classification with Support Vector Machine (SVM)

MSBA-3A
Course code:

Instructor: Adil Sheraz

Objectives
The following are the objectives of this lab.
1. Support Vector Machine
2. How SVM work
3. Support Vector Kernels
4. Implementation of SVC

1. Support Vector Machine

SVMs in short are supervised machine learning algorithms that are used for classification and
regression purposes. SVMs are one of the powerful machine learning algorithms for classification, and
regression. An SVM classifier builds a model that assigns new data points to one of the given categories.
Thus, it can be viewed as a non-probabilistic binary linear classifier. SVMs can be used for linear
classification purposes. In addition to performing linear classification, SVMs can efficiently perform a
non-linear classification using the kernel trick. It enable us to implicitly map the inputs into high
dimensional feature spaces. Before diving into the working of SVM let’s first understand the two basic
terms used in the algorithm “The support vector” and “Hyper-Plane”.
Hyperplane
A hyperplane is a decision boundary that differentiates the two classes in SVM. A data point falling
on either side of the hyperplane can be attributed to different classes. The dimension of the hyperplane
depends on the number of input features in the dataset. If we have 2 input features the hyper-plane will
be a line. Likewise, if the number of features is 3, it will become a two-dimensional plane.

1|Page
Lab-5 Manual Machine Learning Models for Business Analytics

Support Vectors
Support vectors are the data points that are nearest to
the hyper-plane and affect the position and orientation of
the hyper-plane. We have to select a hyperplane, for
which the margin, i-e. the distance between support
vectors and the hyper-plane is maximum. Even a little
interference in the position of these support vectors can
change the hyper-plane.
Margin
A margin is a separation gap between the two lines on
the closest data points. It is calculated as the
perpendicular distance from the line to support vectors or
closest data points. In SVMs, we try to maximize this
separation gap so that we get the maximum margin.

2. How does SVM work

Let’s take an example, we have a classification problem where we have to separate the red data points
from the blue ones.

Since it is a two-dimensional problem, our decision boundary will be a line, for the 3-dimensional
problem we have to use a plane, and similarly, the complexity of the solution will increase with the rising
number of features.

As shown in the above image, we have multiple lines separating the data points successfully. But our
objective is to look for the best solution. There are few rules that can help us to identify the best line.
• Maximum Classification
• Best Separation

2|Page
Lab-5 Manual Machine Learning Models for Business Analytics

Maximum classification, i.e. the selected line must be able to successfully segregate all the data points
into the respective classes. In our example, we can see lines E and D are misclassifying a red data point.
Hence, for this problem lines A, B, C is better than E and D. So we will drop them.

The second rule is Best Separation, which means, we must choose a line such that it is perfectly able
to separate the points.

If we talk about our example, if we get a new red data point closer to line A as shown in the image below,
line A will miss classifying that point. Similarly, if we got a new blue instance closer to line B, then line
A and C will classify the data successfully, whereas line B will miss classifying this new point.

The point to be noticed here, In both the cases line C is successfully classifying all the data points why?
To understand this let’s take all the lines one by one.

Let’s discuss which line to choose:

Why not line A

First, consider line A. If we move line A towards

the left, we can see it has very little chance to miss
classify the blue points. on the other hand, if we
shift line A towards the right side it will very easily
miss-classify the red points. The reason is on the
left side of the margin i.e the distance between the
nearest data point and the line is large whereas on
the right side the margin is very low.

3|Page
Lab-5 Manual Machine Learning Models for Business Analytics

Why not line B

Similarly, in the case of line B. If we shift line B

towards the right side, it has a sufficient margin on
the right side whereas it will wrongly classify the
instances on the left side as the margin towards the
left is very low. Hence, B is not our perfect
solution.

Why not line C

In the case of line C, It has sufficient margin on

the left as well as the right side. This maximum
margin makes line C more robust for the new data
points that might appear in the future. Hence, C is
the best fit in that case that successfully classifies
all the data points with the maximum margin.

This is what SVM looks for, it aims for the maximum margin and creates a line that is equidistant from
both sides, which is line C in our case. so we can say C represents the SVM classifier with the maximum
margin.

Now let’s look at the data below, As we can see this is not linearly separable data, so SVM will not work
in this situation. If anyhow we try to classify this data with a line, the result will not be promising.

4|Page
Lab-5 Manual Machine Learning Models for Business Analytics

So, is there any way that SVM can classify this kind of data? For this problem, we have to create a
decision boundary that looks something like this.

The question is, is it possible to create such a decision boundary using SVM. Well, the answer is Yes.
SVM does this by projecting the data in a higher dimension. As shown in the following image. In the first
case, data is not linearly separable, hence, we project into a higher dimension.

If we have more complex data then SVM will continue to project the data in a higher dimension till it
becomes linearly separable. Once the data become linearly separable, we can use SVM to classify just
like the previous problems.

Projection into Higher Dimension

Now let’s understand how SVM projects the data into a higher dimension. Take this data, it is a circular
non linearly separable dataset.

To project the data in a higher dimension, we are going to create another dimension z, where

Z = X2 + Y2

5|Page
Lab-5 Manual Machine Learning Models for Business Analytics

Now we will plot this feature Z with respect to x, which will give us linearly separable data that looks
like this,

Here, we have created a mapping Z using the base features X and Y, this process is known as kernel
transformation. Precisely, a kernel takes the features as input and creates the linearly separable data in a
higher dimension.

Now the question is, do we have to perform this transformation manually? The answer is no. SVM
handles this process itself, just we have to choose the kernel type.

3. Implementation of SVC

Support Vector Kernels

Linear Kernel

To start with, in the linear kernel, the decision boundary is a straight line. Unfortunately, most of the real-
world data is not linearly separable, this is the reason the linear kernel is not widely used in SVM.

6|Page
Lab-5 Manual Machine Learning Models for Business Analytics

Gaussian / RBF kernel

It is the most commonly used kernel. It projects the data into a Gaussian distribution, where the red points
become the peak of the Gaussian surface and the green data points become the base of the surface, making
the data linearly separable.

But most of the time, this kernel is prone to overfitting and it captures the noise.

Polynomial kernel
At last, we have a polynomial kernel, which is non-uniform in nature due to the polynomial combination
of the base features. It often gives good results.

But the problem with the polynomial kernel is, the number of higher dimension features increases
exponentially. As a result, this is computationally more expensive than RBF or linear kernel.

7|Page
Lab-5 Manual Machine Learning Models for Business Analytics

Hard margin
In a hard margin SVM, the goal is to find the hyperplane that can perfectly separate the data into two
classes without any misclassification. However, this is not always possible when the data is not linearly
separable or contains outliers. In such cases, the hard margin SVM will fail to find a hyperplane that can
perfectly separate the data, and the optimization problem will have no solution.

Soft Margin
In a soft margin SVM, we allow some misclassification by introducing slack variables that allow some
data points to be on the wrong side of the margin. The optimization problem in a soft margin SVM is
modified as follows:

where ξi are the slack variables, and C is a hyperparameter that controls the trade-off between maximizing
the margin and minimizing the misclassification. A larger value of C results in a narrow margin and fewer
misclassifications, while a smaller value of C results in a wider margin but more misclassifications.

Geometrically, the soft margin SVM introduces a penalty for the data points that lie on the wrong side of
the margin or even on the wrong side of the hyperplane. The slack variables ξi allow these data points to
be within the margin or on the wrong side of the hyperplane, but they incur a penalty in the objective
function. The optimization problem in a soft margin SVM finds the hyperplane that maximizes the margin
while minimizing the penalty for the misclassified data points.

Gamma Parameter
It tells us how much will be the influence of the individual data points on the decision boundary.

– Large Gamma: Fewer data points will influence the decision boundary. Therefore, decision
boundary becomes non-linear leading to overfitting

– Small Gamma: More data points will influence the decision boundary. Therefore, the decision
boundary is more generic.

8|Page
Lab-5 Manual Machine Learning Models for Business Analytics

TASK 1
Download the Breast Cancer Dataset from here https://drive.google.com/file/d/1dtxThgA7XHVq08-
ffjJoKuR7fX-QIB1D/view?usp=sharing

1. Perform EDA.
a. Check the head and tail of the dataset.
b. Check the datatype of each feature.
c. Check the missing values.
d. Print the column names.
e. Check unique values of the target column.
f. Check the distribution of the dataset (balanced or not) to the target variable.
g. Check the distribution of the features (histogram helps in this).
h. Check and plot each feature for outliers. (box plot helps in this).
i. If you find outliers in your dataset, which SVM variant should we apply (Hard margin or
Soft margin) explain the reason in your own words.

TASK 2
Perform the following steps:
1. Split the dataset into features and target variables and verify the shape of both.
2. Normalize the dataset (features) with the standard scaler method of sklearn.
3. The target variable is [M/B], convert it into [0/1] categories, by using the Label Encoder method
of sklearn.
4. Split the dataset into train & test splits, with 70% data for training and 30% for testing. Verify
the split by printing the shapes of X_train, X_test, y_train, and y_test.
5. Implement the SVM, as SVC with default parameters from the sklearn library.
6. Check the training and testing accuracies of the model, and explain in your words whether
overfitting or underfitting happened or not.
7. Explain in your own words Overfitting and Underfitting, which type of results show overfitting
or underfitting.
8. Plot / print the confusion matrix.
9. Print the classification report.

9|Page

Support Vector Machine
No ratings yet
Support Vector Machine
40 pages
SVM
No ratings yet
SVM
11 pages
SVMs[1]
No ratings yet
SVMs[1]
30 pages
Support Vector Machine
No ratings yet
Support Vector Machine
17 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
SVMs
No ratings yet
SVMs
30 pages
Unit 2
No ratings yet
Unit 2
47 pages
SVM.pptx
No ratings yet
SVM.pptx
67 pages
Support Vector Machine
100% (1)
Support Vector Machine
25 pages
3.unit 3 ML Part-2 Q&A
No ratings yet
3.unit 3 ML Part-2 Q&A
23 pages
Presentation On Support Vector Machine (SVM)
100% (2)
Presentation On Support Vector Machine (SVM)
22 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
28 pages
Ain3001 - 04 - Support - Vector.machines
No ratings yet
Ain3001 - 04 - Support - Vector.machines
50 pages
Unit2 notes What is a Support Vector Machine
No ratings yet
Unit2 notes What is a Support Vector Machine
11 pages
SVM
No ratings yet
SVM
6 pages
SML Unit 4
No ratings yet
SML Unit 4
61 pages
Support Vector Machine
No ratings yet
Support Vector Machine
31 pages
Machine Learning Unit-3.3
No ratings yet
Machine Learning Unit-3.3
38 pages
Session Svmclassification
No ratings yet
Session Svmclassification
28 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
Unit II 2.2 ML Kernel Machines SVM
No ratings yet
Unit II 2.2 ML Kernel Machines SVM
50 pages
Ann Unit III
No ratings yet
Ann Unit III
20 pages
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
43 pages
UNIT - 2-1
No ratings yet
UNIT - 2-1
7 pages
Unit 2 - SVM - 241016 - 104220
No ratings yet
Unit 2 - SVM - 241016 - 104220
47 pages
SVM Unit 2
No ratings yet
SVM Unit 2
12 pages
DM Lab07 03042023 121836pm
No ratings yet
DM Lab07 03042023 121836pm
5 pages
IVPML Unit III
No ratings yet
IVPML Unit III
139 pages
Support Vector Machine
No ratings yet
Support Vector Machine
9 pages
Support Vector Machine
0% (1)
Support Vector Machine
7 pages
SVM
No ratings yet
SVM
12 pages
UNIT - 2
No ratings yet
UNIT - 2
15 pages
Unit-III - SVM
No ratings yet
Unit-III - SVM
105 pages
Linear Regression & SVM
No ratings yet
Linear Regression & SVM
33 pages
Support Vector Machine
No ratings yet
Support Vector Machine
8 pages
Introduction To Support Vector Machines
No ratings yet
Introduction To Support Vector Machines
46 pages
SVM Theory
No ratings yet
SVM Theory
7 pages
What Is Support Vector Machine
No ratings yet
What Is Support Vector Machine
13 pages
Atc Lecture Tyliu
No ratings yet
Atc Lecture Tyliu
48 pages
data mining techniques
No ratings yet
data mining techniques
27 pages
SVM
No ratings yet
SVM
43 pages
2.1 SVM
No ratings yet
2.1 SVM
16 pages
SVM notes unit 4.docx
No ratings yet
SVM notes unit 4.docx
8 pages
Pca PDF
No ratings yet
Pca PDF
10 pages
Support Vector Machine: Suraj Kumar Das
No ratings yet
Support Vector Machine: Suraj Kumar Das
10 pages
4 - Lec 11 - ML - Support Vector Machine
No ratings yet
4 - Lec 11 - ML - Support Vector Machine
6 pages
Support Vactor Machine Final
No ratings yet
Support Vactor Machine Final
11 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
DMML Unit4 - SVM
No ratings yet
DMML Unit4 - SVM
50 pages
ML-Lec9-SVM
No ratings yet
ML-Lec9-SVM
32 pages
Support Vector Machinephd Thesis
100% (2)
Support Vector Machinephd Thesis
6 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Machine Learning (CSO851) - Lecture 05
No ratings yet
Machine Learning (CSO851) - Lecture 05
27 pages
SVM Scribe Notes
No ratings yet
SVM Scribe Notes
16 pages
support_vector_machines
No ratings yet
support_vector_machines
12 pages
Support Vector Machine
No ratings yet
Support Vector Machine
21 pages
SVM Manual
No ratings yet
SVM Manual
7 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet
Exhibitors List
No ratings yet
Exhibitors List
116 pages
DESIGN - V Brief - Further Refined - R1
No ratings yet
DESIGN - V Brief - Further Refined - R1
15 pages
2022-2023 Grade 6 DLL MAPEH 6 Q1 Week 3
No ratings yet
2022-2023 Grade 6 DLL MAPEH 6 Q1 Week 3
5 pages
2ND Periodical Test in Tle X
100% (1)
2ND Periodical Test in Tle X
3 pages
End of Course Jeopardy by Slidesgo
No ratings yet
End of Course Jeopardy by Slidesgo
74 pages
Events: Onload and Onunload
No ratings yet
Events: Onload and Onunload
17 pages
Senior SOC Security Analyst L2 - Digital14
No ratings yet
Senior SOC Security Analyst L2 - Digital14
1 page
11 - SQL Injection Attack - Requires Internet
No ratings yet
11 - SQL Injection Attack - Requires Internet
7 pages
1.5. Cloud Deployment Models
No ratings yet
1.5. Cloud Deployment Models
51 pages
Watch Cruella 2021 Full HD Movie Online Free Now - GGB
No ratings yet
Watch Cruella 2021 Full HD Movie Online Free Now - GGB
5 pages
M530 Specs
No ratings yet
M530 Specs
3 pages
Title Slide With Image (960 X 384 PX), Corpo A, 24 PT., 2 Lines Possible
No ratings yet
Title Slide With Image (960 X 384 PX), Corpo A, 24 PT., 2 Lines Possible
12 pages
SSCE CS PracticalsList 2024 2025
No ratings yet
SSCE CS PracticalsList 2024 2025
4 pages
Torque Tester DREMOTEST E 8612-300 PDF
No ratings yet
Torque Tester DREMOTEST E 8612-300 PDF
39 pages
Full Download (Ebook) Cyber-Security in Critical Infrastructures: A Game-Theoretic Approach by Stefan Rass, Stefan Schauer, Sandra König, Quanyan Zhu ISBN 9783030469078, 9783030469085, 3030469077, 3030469085 PDF DOCX
100% (1)
Full Download (Ebook) Cyber-Security in Critical Infrastructures: A Game-Theoretic Approach by Stefan Rass, Stefan Schauer, Sandra König, Quanyan Zhu ISBN 9783030469078, 9783030469085, 3030469077, 3030469085 PDF DOCX
57 pages
A76XX Series - TCPIP - Application Note - V1.01
No ratings yet
A76XX Series - TCPIP - Application Note - V1.01
26 pages
2 PREP Grade Primary
No ratings yet
2 PREP Grade Primary
5 pages
15ec743-Real Time Systems
No ratings yet
15ec743-Real Time Systems
33 pages
Question Bank - WP
No ratings yet
Question Bank - WP
9 pages
Syllabus Ee541 22sp
No ratings yet
Syllabus Ee541 22sp
7 pages
2022-23 S4 Mid-Year Exam P1
No ratings yet
2022-23 S4 Mid-Year Exam P1
19 pages
Instant Download Matrix Algebra: Theory, Computations and Applications in Statistics 3rd Edition James E. Gentle PDF All Chapters
100% (1)
Instant Download Matrix Algebra: Theory, Computations and Applications in Statistics 3rd Edition James E. Gentle PDF All Chapters
65 pages
SALV SAMPLE With Button
No ratings yet
SALV SAMPLE With Button
11 pages
Ecdis Training
100% (2)
Ecdis Training
4 pages
Mindstorm-NxT User Guide
No ratings yet
Mindstorm-NxT User Guide
28 pages
EN - ACSM1-204 - Quick Guide - A
No ratings yet
EN - ACSM1-204 - Quick Guide - A
14 pages
Sigma-SD Spindle 001
No ratings yet
Sigma-SD Spindle 001
60 pages
Halo Delay Effects Settings - Line 6 and Fractal
No ratings yet
Halo Delay Effects Settings - Line 6 and Fractal
3 pages
STE-Micro Project
No ratings yet
STE-Micro Project
15 pages
CTPS - Unit 4 Notes
No ratings yet
CTPS - Unit 4 Notes
33 pages