0% found this document useful (0 votes)

6 views

ML Assign3

ML 3 ASSIGNMENT OF SPPU

Uploaded by

hrr601097

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

ML Assign3

ML 3 ASSIGNMENT OF SPPU

Uploaded by

hrr601097

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Machine Learning

Assignment No. 3
Q.1. Define Following terms with Suitable example:
i) Confusion matrix ii) False positive rate iii) True positive rate

i) Confusion Matrix

A confusion matrix is a table used to evaluate the performance of a classification model. It

summarizes the results of predictions made by the model compared to the actual outcomes.
The matrix typically includes four components:

 True Positives (TP): Correctly predicted positive instances

 True Negatives (TN): Correctly predicted negative instances
 False Positives (FP): Incorrectly predicted positive instances (Type I error)
 False Negatives (FN): Incorrectly predicted negative instances (Type II

error) Example:

Consider a binary classification problem where we are trying to predict whether an email is
spam or not. After testing the model, we get the following results:

Predicted Spam Predicted Not Spam

Actual Spam 50 (TP) 10 (FN)
Actual Not Spam 5 (FP) 35 (TN)

Confusion Matrix:

Predicted
Spam Not Spam
Actual Spam 50 10
Actual Not Spam 5 35

ii) False Positive Rate (FPR)

The false positive rate measures the proportion of negative instances that were incorrectly
classified as positive. It is calculated using the formula:

FPR=FPFP+TN\text{FPR} = \frac{FP}{FP + TN}FPR=FP+TNFP

Example:

Using the confusion matrix from the previous example:

 FP (False Positives) = 5
 TN (True Negatives) = 35
FPR=55+35=540=0.125\text{FPR} = \frac{5}{5 + 35} = \frac{5}{40} = 0.125FPR=5+355
=405=0.125

So, the false positive rate is 0.125, or 12.5%.

iii) True Positive Rate (TPR)

The true positive rate, also known as recall or sensitivity, measures the proportion of actual
positive instances that were correctly classified. It is calculated using the formula:

TPR=TPTP+FN\text{TPR} = \frac{TP}{TP + FN}TPR=TP+FNTP

Example:

Again using the confusion matrix from earlier:

 TP (True Positives) = 50
 FN (False Negatives) = 10

TPR=5050+10=5060=56≈0.833\text{TPR} = \frac{50}{50 + 10} = \frac{50}{60} =

\frac{5}{6} \approx 0.833TPR=50+1050=6050=65≈0.833

Thus, the true positive rate is approximately 0.833, or 83.3%.

Q.2. What is contingency table/Matrix? What is the use of it?

Contingency Table (Matrix)

Definition: A contingency table, also known as a cross-tabulation or crosstab, is a statistical

table that categorizes data based on two or more variables. It is used to summarize and
analyze the relationship between these variables.

Structure:

 Rows: Represent one variable.

 Columns: Represent another variable.
 Cells: Contain the frequency or count of observations that fall into specific categories
of both variables.

Example:

Gender Education Level Count

Male High School 50
Male College 75
Female High School 40
Female College 60
Uses of Contingency Tables:

1. Summarizing Data: Contingency tables provide a clear and concise way to present
data, making it easier to understand relationships and patterns between variables.
2. Testing for Independence: Statistical tests like the chi-square test can be applied to
contingency tables to determine if two variables are independent or related.
3. Analyzing Associations: Contingency tables can help identify associations or
dependencies between variables. For example, you might find that gender is
associated with education level.
4. Comparing Groups: Contingency tables can be used to compare different groups
based on their distribution across categories of another variable.

Q.3. What are support vectors and margins? Explain sift SVM and hard SVM.

Support Vectors and Margins in SVM

Support Vectors: In Support Vector Machines (SVMs), support vectors are the data points
that lie closest to the decision boundary. These points are crucial because they define the
margin, which separates the two classes.

Margins: The margin is the distance between the decision boundary and the nearest data
points (support vectors) on either side. In SVM, the goal is to find the decision boundary that
maximizes this margin. This is because a larger margin generally leads to better
generalization performance, as it helps the model avoid overfitting.

Soft SVM vs. Hard SVM

Hard SVM:

 Assumption: Assumes that the data is linearly separable, meaning there exists a clear
hyperplane that perfectly separates the two classes.
 Goal: Finds the hyperplane with the largest margin that correctly classifies all training
examples.
 Limitation: Can be sensitive to outliers or noisy data, as a single outlier can
significantly affect the position of the decision boundary.

Soft SVM:

 Assumption: Acknowledges that real-world data may not be perfectly linearly

separable due to noise or outliers.
 Goal: Introduces a slack variable for each data point, allowing for some
misclassifications. The objective is to find a hyperplane that maximizes the margin
while minimizing the number of misclassifications.
 Regularization: Uses a regularization parameter to balance the trade-off between
maximizing the margin and minimizing the number of misclassification
Q.4. What is slack variable? Discuss margin errors.

Slack Variables

In Support Vector Machines (SVMs), slack variables are introduced to allow for some
misclassifications when the data is not perfectly linearly separable. These variables measure
the degree to which a data point violates the margin constraint.

 Positive slack variable: Indicates that a data point is on the wrong side of the margin
or even misclassified.
 Zero slack variable: Indicates that a data point is correctly classified and lies on or
within the margin.

Margin Errors

Margin errors refer to the data points that are misclassified or lie on the wrong side of the
margin. The number of margin errors is directly related to the values of the slack variables.

 Hard Margin SVM: Does not allow for any margin errors, as the slack variables are
constrained to be zero.
 Soft Margin SVM: Allows for a certain number of margin errors, as the slack
variables can be positive.

The regularization parameter (C) in soft margin SVM controls the trade-off between
maximizing the margin and minimizing the number of margin errors. A larger C value
penalizes margin errors more heavily, leading to a smaller margin and fewer
misclassifications. Conversely, a smaller C value allows for more margin errors but results in
a larger margin.

Q.5. Explain kernel methods for non- linearity.

Kernel Methods for Non-Linearity in SVM

Kernel Trick: The kernel trick is a mathematical technique used in SVMs to transform data
into a higher-dimensional feature space, making it possible to find non-linear decision
boundaries. This transformation is done without explicitly computing the coordinates of the
data points in the higher-dimensional space.

Kernel Functions: Kernel functions are used to compute the inner product between data
points in the transformed feature space. Common kernel functions include:

 Linear Kernel: For linear relationships between data points.

 Polynomial Kernel: For polynomial relationships.
 Radial Basis Function (RBF) Kernel: For non-linear relationships with a Gaussian-
shaped kernel.
 Sigmoid Kernel: For neural network-like behavior.

How Kernel Methods Work:

1. Choose a Kernel Function: Select a kernel function that is appropriate for the
expected relationship between the data points.
2. Compute Kernel Matrix: Calculate the kernel matrix, where each element represents
the inner product between two data points in the transformed feature space.
3. Train SVM: Use the kernel matrix to train the SVM. The SVM algorithm operates in
the transformed feature space, finding a hyperplane that separates the data points.
4. Make Predictions: To make predictions for new data points, compute their kernel
values with the training data and use the trained SVM model.

Q.6. Calculate macro average precision, macro average recall and macro average F-
score for the following given confusion matrix of multi-class classification.

Calculating Macro Average Precision, Recall, and F-Score

Understanding the Confusion Matrix:

Before we calculate the metrics, let's identify the true positives (TP), false positives (FP), true
negatives (TN), and false negatives (FN) for each class:

Class TP FP FN TN
A 100 0 0 9
B 9 0 0 100
C 8 0 0 100
D 9 0 0 100

Calculating Precision, Recall, and F-Score for Each Class:

 Precision = TP / (TP + FP)

 Recall = TP / (TP + FN)
 F-Score = 2 * (Precision * Recall) / (Precision +

Recall) For Class A:

 Precision = 100 / (100 + 0) = 1

 Recall = 100 / (100 + 0) = 1
 F-Score = 2 * (1 * 1) / (1 + 1) = 1

Similarly, calculate for classes B, C, and

D.
Calculating Macro Averages:

 Macro Average Precision = (Precision for Class A + Precision for Class B + Precision
for Class C + Precision for Class D) / 4
 Macro Average Recall = (Recall for Class A + Recall for Class B + Recall for Class C
+ Recall for Class D) / 4
 Macro Average F-Score = (F-Score for Class A + F-Score for Class B + F-Score for
Class C + F-Score for Class D) / 4

Based on the given confusion matrix:

 Macro Average Precision = (1 + 1 + 1 + 1) / 4 = 1

 Macro Average Recall = (1 + 1 + 1 + 1) / 4 = 1
 Macro Average F-Score = (1 + 1 + 1 + 1) / 4 = 1

Interpretation:

In this case, the macro average precision, recall, and F-score are all 1, indicating that the
model performs perfectly across all classes. This is likely due to the simplified nature of the
confusion matrix, where each class is correctly predicted for all instances. In real-world
scenarios, these metrics will typically be less than perfect.

Q.7. Define following terms with reference to SVM.

i) Separating hyperplane ii) Margin

Separating Hyperplane

In a Support Vector Machine (SVM), a separating hyperplane is a decision boundary that

divides the feature space into two regions, each corresponding to a different class. It is a
linear equation that separates the data points of different classes.

 In two dimensions: The separating hyperplane is a line.

 In higher dimensions: It's a hyperplane.

The goal of SVM is to find the optimal separating hyperplane that maximizes the margin
between the two classes.

Margin

The margin is the distance between the separating hyperplane and the nearest data points
(support vectors) on either side. In SVM, the objective is to find the hyperplane that
maximizes this margin.

 Why maximize the margin? A larger margin generally leads to better generalization
performance, as it helps the model avoid overfitting. It implies that the model is more
confident in its predictions for new, unseen data.

SVM Using Python
No ratings yet
SVM Using Python
24 pages
Attachment 2
No ratings yet
Attachment 2
4 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Supervised Alg
No ratings yet
Supervised Alg
27 pages
Time Series Forecasting by Using Wavelet Kernel SVM
No ratings yet
Time Series Forecasting by Using Wavelet Kernel SVM
52 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
CMPE 442 Introduction To Machine Learning: Support Vector Machines
No ratings yet
CMPE 442 Introduction To Machine Learning: Support Vector Machines
64 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Fitting A Model To Data
No ratings yet
Fitting A Model To Data
41 pages
cs221-lecture11
No ratings yet
cs221-lecture11
71 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
unit 6 ai
No ratings yet
unit 6 ai
28 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
SVM
No ratings yet
SVM
36 pages
Support Vactor Machine Final
No ratings yet
Support Vactor Machine Final
11 pages
Support Vector Machines (SVMS) : Cs479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Support Vector Machines (SVMS) : Cs479/679 Pattern Recognition Dr. George Bebis
37 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Information Securtiy
No ratings yet
Information Securtiy
8 pages
Unit II 2.2 ML Kernel Machines SVM
No ratings yet
Unit II 2.2 ML Kernel Machines SVM
50 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
4 pages
svm
No ratings yet
svm
36 pages
IML-IITKGP - Assignment 5 Solution
No ratings yet
IML-IITKGP - Assignment 5 Solution
7 pages
SVM Assignment ABA Course To Be Returned With Your Answers
No ratings yet
SVM Assignment ABA Course To Be Returned With Your Answers
10 pages
Svm Student
No ratings yet
Svm Student
40 pages
UNIT - 2
No ratings yet
UNIT - 2
15 pages
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
43 pages
Introduction To Support Vector Machines: BTR Workshop Fall 2006
No ratings yet
Introduction To Support Vector Machines: BTR Workshop Fall 2006
88 pages
Introduction To Support Vector Machines: BTR Workshop Fall 2006
No ratings yet
Introduction To Support Vector Machines: BTR Workshop Fall 2006
88 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
Presentation - SVM & KM - May 2009
No ratings yet
Presentation - SVM & KM - May 2009
24 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
Prediction---accuracy
No ratings yet
Prediction---accuracy
33 pages
Chapitre_2
No ratings yet
Chapitre_2
26 pages
SVM Exam Paper For ABA Course To Be Returned With Answers Excel Exercise (25 Marks)
No ratings yet
SVM Exam Paper For ABA Course To Be Returned With Answers Excel Exercise (25 Marks)
12 pages
Support Vector Machine (SVM) Algorithm
No ratings yet
Support Vector Machine (SVM) Algorithm
8 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
SVM
No ratings yet
SVM
40 pages
MergedPDF Iml
No ratings yet
MergedPDF Iml
114 pages
2.1 SVM
No ratings yet
2.1 SVM
16 pages
Chapter3 Classification Summary Final
No ratings yet
Chapter3 Classification Summary Final
11 pages
Lecture 14
No ratings yet
Lecture 14
20 pages
Svm
No ratings yet
Svm
40 pages
Data Science Interview Questions
100% (1)
Data Science Interview Questions
68 pages
10 SVM
No ratings yet
10 SVM
23 pages
Birke - Yeshambel-Support Vector Machine Algorigthm
No ratings yet
Birke - Yeshambel-Support Vector Machine Algorigthm
6 pages
Class Imbalance Notes
No ratings yet
Class Imbalance Notes
6 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
FDS_notes
No ratings yet
FDS_notes
6 pages
SVM Class 2
No ratings yet
SVM Class 2
87 pages
Unit 2
No ratings yet
Unit 2
47 pages
Module 4-Part 2
No ratings yet
Module 4-Part 2
45 pages
Ann Unit III
No ratings yet
Ann Unit III
20 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
UBICC Article 522 522
No ratings yet
UBICC Article 522 522
8 pages
Support Vector Machine
No ratings yet
Support Vector Machine
50 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Literarure PDF
No ratings yet
Literarure PDF
4 pages
Mth302 Quiz 3
No ratings yet
Mth302 Quiz 3
2 pages
Research Observation
No ratings yet
Research Observation
5 pages
17 Synopses
No ratings yet
17 Synopses
105 pages
CCP Assignment Data Science 03062022 121250pm
No ratings yet
CCP Assignment Data Science 03062022 121250pm
3 pages
The Effect of Customer Relationship Management On Customer Satisfaction With Health Facilities in Tanzania
No ratings yet
The Effect of Customer Relationship Management On Customer Satisfaction With Health Facilities in Tanzania
6 pages
Demand Forecasting
No ratings yet
Demand Forecasting
23 pages
Written Report in FM - Chapter 3
100% (2)
Written Report in FM - Chapter 3
21 pages
Ap Statistics Unit 9 TST
No ratings yet
Ap Statistics Unit 9 TST
2 pages
Kepuasan Kualitas Pelayanan Terhadap Kepatuhan Pelaporan SPT Tahunan WPOP
No ratings yet
Kepuasan Kualitas Pelayanan Terhadap Kepatuhan Pelaporan SPT Tahunan WPOP
13 pages
Matriks Sorotan Kajian Berkaitan Penerimaan GURU
No ratings yet
Matriks Sorotan Kajian Berkaitan Penerimaan GURU
29 pages
Midterm Principles
No ratings yet
Midterm Principles
8 pages
Strategi Pengembangan Usaha Lada Di UD. Dua Putri Bumijo Kota Yogyakarta Daerah Istimewa Yogyakarta
No ratings yet
Strategi Pengembangan Usaha Lada Di UD. Dua Putri Bumijo Kota Yogyakarta Daerah Istimewa Yogyakarta
15 pages
Metode Kuadrat Terkecil
No ratings yet
Metode Kuadrat Terkecil
45 pages
Data Analytics - Notes
No ratings yet
Data Analytics - Notes
1 page
Synthpop: Bespoke Creation of Synthetic Data in R: Beata Nowok Gillian M Raab Chris Dibben
No ratings yet
Synthpop: Bespoke Creation of Synthetic Data in R: Beata Nowok Gillian M Raab Chris Dibben
26 pages
Financial Management Practices and Performance of Smes in Ghana: The Moderating Role of Firm Age
No ratings yet
Financial Management Practices and Performance of Smes in Ghana: The Moderating Role of Firm Age
12 pages
Laundry Shop Sample Thesis
100% (3)
Laundry Shop Sample Thesis
7 pages
The Role of Principal in School Improvement Synopses
No ratings yet
The Role of Principal in School Improvement Synopses
11 pages
Does External Environment Influence Organizational Performance? The Case of Kenyan State Corporations
No ratings yet
Does External Environment Influence Organizational Performance? The Case of Kenyan State Corporations
11 pages
Clustering
No ratings yet
Clustering
6 pages
ArcPy and ArcGIS - Geospatial Analysis With Python - Sample Chapter
No ratings yet
ArcPy and ArcGIS - Geospatial Analysis With Python - Sample Chapter
24 pages
DATA SCIENCE Using R Notes
No ratings yet
DATA SCIENCE Using R Notes
116 pages
Process Mining UNIT1
No ratings yet
Process Mining UNIT1
17 pages
IE342 WORK Analysis and Design Laboratory Manual: Prepared By: Dr. Tamer Mohamed Ahmed Khalaf
No ratings yet
IE342 WORK Analysis and Design Laboratory Manual: Prepared By: Dr. Tamer Mohamed Ahmed Khalaf
26 pages
Practical No 1: Aim:Breadth First Search & Iterative Depth First Search
No ratings yet
Practical No 1: Aim:Breadth First Search & Iterative Depth First Search
36 pages
601 - Unit 1
No ratings yet
601 - Unit 1
14 pages
Refusal and Politeness Strategies in Relation To Social Status: A Case of Face-Threatening Act Among Indonesian University Students
No ratings yet
Refusal and Politeness Strategies in Relation To Social Status: A Case of Face-Threatening Act Among Indonesian University Students
13 pages
The Nurses Role in Palliative Care A Qualitative
No ratings yet
The Nurses Role in Palliative Care A Qualitative
18 pages