ML Assign3
ML Assign3
Assignment No. 3
Q.1. Define Following terms with Suitable example:
i) Confusion matrix ii) False positive rate iii) True positive rate
i) Confusion Matrix
error) Example:
Consider a binary classification problem where we are trying to predict whether an email is
spam or not. After testing the model, we get the following results:
Confusion Matrix:
Predicted
Spam Not Spam
Actual Spam 50 10
Actual Not Spam 5 35
The false positive rate measures the proportion of negative instances that were incorrectly
classified as positive. It is calculated using the formula:
Example:
FP (False Positives) = 5
TN (True Negatives) = 35
FPR=55+35=540=0.125\text{FPR} = \frac{5}{5 + 35} = \frac{5}{40} = 0.125FPR=5+355
=405=0.125
The true positive rate, also known as recall or sensitivity, measures the proportion of actual
positive instances that were correctly classified. It is calculated using the formula:
Example:
TP (True Positives) = 50
FN (False Negatives) = 10
Structure:
Example:
1. Summarizing Data: Contingency tables provide a clear and concise way to present
data, making it easier to understand relationships and patterns between variables.
2. Testing for Independence: Statistical tests like the chi-square test can be applied to
contingency tables to determine if two variables are independent or related.
3. Analyzing Associations: Contingency tables can help identify associations or
dependencies between variables. For example, you might find that gender is
associated with education level.
4. Comparing Groups: Contingency tables can be used to compare different groups
based on their distribution across categories of another variable.
Q.3. What are support vectors and margins? Explain sift SVM and hard SVM.
Support Vectors: In Support Vector Machines (SVMs), support vectors are the data points
that lie closest to the decision boundary. These points are crucial because they define the
margin, which separates the two classes.
Margins: The margin is the distance between the decision boundary and the nearest data
points (support vectors) on either side. In SVM, the goal is to find the decision boundary that
maximizes this margin. This is because a larger margin generally leads to better
generalization performance, as it helps the model avoid overfitting.
Hard SVM:
Assumption: Assumes that the data is linearly separable, meaning there exists a clear
hyperplane that perfectly separates the two classes.
Goal: Finds the hyperplane with the largest margin that correctly classifies all training
examples.
Limitation: Can be sensitive to outliers or noisy data, as a single outlier can
significantly affect the position of the decision boundary.
Soft SVM:
Slack Variables
In Support Vector Machines (SVMs), slack variables are introduced to allow for some
misclassifications when the data is not perfectly linearly separable. These variables measure
the degree to which a data point violates the margin constraint.
Positive slack variable: Indicates that a data point is on the wrong side of the margin
or even misclassified.
Zero slack variable: Indicates that a data point is correctly classified and lies on or
within the margin.
Margin Errors
Margin errors refer to the data points that are misclassified or lie on the wrong side of the
margin. The number of margin errors is directly related to the values of the slack variables.
Hard Margin SVM: Does not allow for any margin errors, as the slack variables are
constrained to be zero.
Soft Margin SVM: Allows for a certain number of margin errors, as the slack
variables can be positive.
The regularization parameter (C) in soft margin SVM controls the trade-off between
maximizing the margin and minimizing the number of margin errors. A larger C value
penalizes margin errors more heavily, leading to a smaller margin and fewer
misclassifications. Conversely, a smaller C value allows for more margin errors but results in
a larger margin.
Kernel Trick: The kernel trick is a mathematical technique used in SVMs to transform data
into a higher-dimensional feature space, making it possible to find non-linear decision
boundaries. This transformation is done without explicitly computing the coordinates of the
data points in the higher-dimensional space.
Kernel Functions: Kernel functions are used to compute the inner product between data
points in the transformed feature space. Common kernel functions include:
1. Choose a Kernel Function: Select a kernel function that is appropriate for the
expected relationship between the data points.
2. Compute Kernel Matrix: Calculate the kernel matrix, where each element represents
the inner product between two data points in the transformed feature space.
3. Train SVM: Use the kernel matrix to train the SVM. The SVM algorithm operates in
the transformed feature space, finding a hyperplane that separates the data points.
4. Make Predictions: To make predictions for new data points, compute their kernel
values with the training data and use the trained SVM model.
Q.6. Calculate macro average precision, macro average recall and macro average F-
score for the following given confusion matrix of multi-class classification.
Before we calculate the metrics, let's identify the true positives (TP), false positives (FP), true
negatives (TN), and false negatives (FN) for each class:
Class TP FP FN TN
A 100 0 0 9
B 9 0 0 100
C 8 0 0 100
D 9 0 0 100
D.
Calculating Macro Averages:
Macro Average Precision = (Precision for Class A + Precision for Class B + Precision
for Class C + Precision for Class D) / 4
Macro Average Recall = (Recall for Class A + Recall for Class B + Recall for Class C
+ Recall for Class D) / 4
Macro Average F-Score = (F-Score for Class A + F-Score for Class B + F-Score for
Class C + F-Score for Class D) / 4
Interpretation:
In this case, the macro average precision, recall, and F-score are all 1, indicating that the
model performs perfectly across all classes. This is likely due to the simplified nature of the
confusion matrix, where each class is correctly predicted for all instances. In real-world
scenarios, these metrics will typically be less than perfect.
Separating Hyperplane
The goal of SVM is to find the optimal separating hyperplane that maximizes the margin
between the two classes.
Margin
The margin is the distance between the separating hyperplane and the nearest data points
(support vectors) on either side. In SVM, the objective is to find the hyperplane that
maximizes this margin.
Why maximize the margin? A larger margin generally leads to better generalization
performance, as it helps the model avoid overfitting. It implies that the model is more
confident in its predictions for new, unseen data.