ML Copy
ML Copy
1 Machine Learning 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Iris Dataset Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
References 31
Chapter 1
Machine Learning
1.1 Introduction
The primary objective of this lab report is to explore, implement, and evaluate a diverse
range of machine learning algorithms to understand their effectiveness and practical applica-
tions. The algorithms under investigation include K-Means Clustering, K-Nearest Neighbors
(KNN), Linear Regression, Logistic Regression, Support Vector Machines (SVM), Naive
Bayes, and Decision Trees. Each of these techniques is designed to tackle specific types of
problems, and their performance can vary depending on the data and task they are applied to.
It should be mentioned that the algorithm is implemented on a single dataset called the Iris
dataset.
In addition to these core algorithms, the report will also delve into the mathematical foun-
dations of Classification and Regression Trees (CART) and Iterative Dichotomiser 3 (ID3).
Understanding the underlying mathematics of these techniques is crucial for grasping their
operational mechanics and evaluating their effectiveness. Furthermore, the theoretical prin-
ciples behind Support Vector Machines (SVM) will be examined in detail to provide a
comprehensive understanding of how this powerful algorithm functions.
The report is structured into three main parts. The first part focuses on implementing
the selected algorithms on the Iris dataset, evaluating their performance, and analyzing their
strengths and limitations in various scenarios. The second part provides a theoretical explo-
ration of the mathematical principles underlying CART, ID3, and SVM, offering insights into
the core concepts that drive these algorithms. The third part involves a review of a relevant
paper on Deep Visual Analytics, which will help contextualize the findings within current
2 Machine Learning
research and highlight best practices and emerging trends in the field.
2.1.1 Algorithm
• Initialize K centroids randomly.
• Update the centroids by calculating the mean of the points assigned to each cluster.
• Repeat steps 2 and 3 until convergence (i.e., centroids do not change significantly.)
2.2.1 Algorithm
• Choose the number of neighbors K.
• Compute the distance between the new data point and all training data points.
• Identify the K nearest neighbors based on the distance.
• Assign the new data point to the class that is most common among its K neighbors.
2.2 KNN Algorithm 5
2.2.2 Code
1 # 2. K - Nearest Neighbors ( KNN )
2 knn = K N e i g h b or s C l a s s i f i e r ( n_neighbors =3)
3 knn . fit ( X_train , y_train )
4 y_pred_knn = knn . predict ( X_test )
5 accuracy_knn = accuracy_score ( y_test , y_pred_knn )
6 print ( f ’ KNN ␣ Accuracy : ␣ { accuracy_knn ␣ * ␣ 100:.2 f }% ’)
7
2.2.3 Result
2.3.1 Algorithm
• Define the hypothesis as a linear equation: y = β0 + β1 x1 + β2 x2 + · · · + βn xn
• Use a cost function (Mean Squared Error) to measure the error between the
predicted values and actual values.
• Use optimization techniques (like Gradient Descent) to minimize the cost function
and update the weights β0 , β1 , . . . , βn .
• Once the cost function is minimized, use the learned weights to make predictions.
2.3 Linear Regression 7
2.3.2 Code
1 # 3. Linear Regression ( predict one feature , here feature 0)
2 linear_reg = LinearRegression ()
3 linear_reg . fit ( X_train [: , 0]. reshape ( -1 , 1) , X_train [: , 1])
4 y_pred_lr = linear_reg . predict ( X_test [: , 0]. reshape ( -1 , 1) )
5
2.3.3 Result
2.4.1 Algorithm
1
• Define the hypothesis using the sigmoid function: P(y = 1 | x) =
1+e−(β0 +β1 x1 +···+βn xn )
• Use a loss function (Log-Loss) to measure the error in classification.
• Apply optimization techniques (like Gradient Descent) to minimize the loss and
update the weights.
• After minimizing the loss, use the probability score to classify new data points
based on a threshold (e.g., 0.5).
2.4.2 Code
1 # 4. Logistic Regression
2 log_reg = Lo gi st icR eg re ssi on ( max_iter =200)
3 log_reg . fit ( X_train [: , [0 , 1]] , y_train )
4 y_pred_logreg = log_reg . predict ( X_test [: , [0 , 1]])
5 accuracy_logreg = accuracy_score ( y_test , y_pred_logreg )
6 print ( f ’ Logistic ␣ Regression ␣ Accuracy : ␣ { accuracy_logreg ␣ * ␣ 100:.2 f }% ’)
7
2.4.3 Result
2.5.1 Algorithm
• Define a hyperplane that separates the data points in the feature space.
• Maximize the margin between the hyperplane and the nearest data points from
each class (support vectors).
• Use optimization techniques to solve the problem and find the optimal hyperplane.
• Classify new data points based on which side of the hyperplane they fall.
2.5.2 Code
1 # 5. Support Vector Machine ( SVM )
2 svm = SVC ( kernel = ’ linear ’ , random_state =42)
3 svm . fit ( X_train [: , [0 , 1]] , y_train )
4 y_pred_svm = svm . predict ( X_test [: , [0 , 1]])
10 Machine Learning Algorithm and Its Implementation
2.5.3 Result
2.6.1 Algorithm
• Calculate the prior probability for each class.
• Calculate the likelihood of the input features for each class.
• Apply Bayes’ Theorem to compute the posterior probability for each class:
• Assign the class with the highest posterior probability to the new data point.
2.6.2 Code
1 # 6. Naive Bayes
2 nb = GaussianNB ()
3 nb . fit ( X_train [: , [0 , 1]] , y_train )
4 y_pred_nb = nb . predict ( X_test [: , [0 , 1]])
5 accuracy_nb = accuracy_score ( y_test , y_pred_nb )
6 print ( f ’ Naive ␣ Bayes ␣ Accuracy : ␣ { accuracy_nb ␣ * ␣ 100:.2 f }% ’)
7
2.6.3 Result
2.7.1 Algorithm
• Select the best feature to split the data using metrics like Gini impurity or infor-
mation gain.
• Split the dataset into subsets based on the selected feature.
• Recursively repeat the process for each subset until the stopping criterion is met
(e.g., maximum depth or minimum samples per leaf).
• Classify new data points by traversing the tree based on feature values.
2.7.2 Code
1 # 7. Decision Tree
2 tree = D e c i s i o n T r e e C l a s s i f i e r ( random_state =42)
2.7 Decision Tree 13
2.7.3 Result
3.1.1 Problem
Lets start with weather data set, which is quite famous in explaining decision tree algo-
rithm,where target is to predict play or not( Yes or No) based on weather condition.
3.1.2 Solution
From data, outlook, temperature, humidity, wind are the features of data. So, lets start
building tree, Outlook is a nominal feature. it can take three value, sunny, overcast and rain.
Lets summarize the final decision for outlook features,
Gini index calculated by subtracting the sum of square probability of each class:
n
Gini = 1 − ∑ pi
i=1
Humidity is a binary class feature , it can take two value high and normal.
Gini(humidity=high) = 1-(3/7)²-(4/7)² = 0.489
Gini(humidity=normal) = 1-(6/7)²-(1/7)² = 0.244
Now, the weighted sum of Gini index for humidity features can be calculated as,
Gini(humidity) = (7/14) *0.489 + (7/14) *0.244=0.367.
wind is a binary class feature , it can take two value weak and strong.
Gini(wind=weak)= 1-(6/8)²-(2/8)² = 0.375
Gini(wind=strong)= 1-(3/6)²-(3/6)²= 0.5
Now, the weighted sum of Gini index for wind features can be calculated as,
Gini(wind) = (8/14) *0.375 + (6/14) *0.5=0.428
So,the final decision of all the features,
3.1 CART: Classification and Regression Tree 17
From table, we can seen that Gini index for outlook feature is lowest. So we get our root
node.
Now, lets focus on sub data on sunny outlook feature. we need to find the Gini index for
temperature, humidity and wind feature respectively.
18 Problem on CART and ID3 Algorithm
we have calculated the Gini index of all the features when the outlook is sunny. You can
infer that humidity has lowest value. so next node will be humidity.
Now,Lets focus on sub data for overcast outlook feature. and calculate till all dataset split
is a same manner.
3.2 ID3 :Iterative Dichotomiser 21
Formula: Entropy
c
H(S) = − ∑ pi log2 pi
i=1
|Sv |
Gain(S, A) = H(S) − ∑ H(Sv )
v∈Values(A)
|S|
3.2.1 Problem
Lets start with weather data set, which is quite famous in explaining decision tree algo-
rithm,where target is to predict play or not( Yes or No) based on weather condition.
3.2.2 Solution
From the data we can observe that Number of observations = 14
Number of observations having Decision ‘Yes’ = 9
probability of ‘Yes’ , p(Yes) = 9/14
3.2 ID3 :Iterative Dichotomiser 23
we can see that information gain of ‘Humidity’ attribute is higher than other. so, it is next
node under sunny outlook factor.
Humidity takes two value, normal and high. From the both table, you can infer that, whenever
humidity is high decision is ‘No’. And, when humidity is normal decision is ‘Yes’ Now,Lets
focus on other sub data feature. and calculate till all dataset split is a same manner. So, final
decision tree will be:
4.4 Methodology
Grad-CAM (Gradient-weighted Class Activation Mapping) provides a method for visualizing
the decisions made by convolutional neural networks (CNNs) by generating heatmaps that
highlight the most influential regions of an input image.
The process begins with a forward pass through the CNN, where the input image is pro-
cessed to produce predictions. During this pass, feature maps are extracted from the last
convolutional layer, which precedes the final classification layer. In the backward pass, the
gradients of the score for the target class with respect to these feature maps are computed.
These gradients indicate how changes in the feature maps affect the score of the predicted
class.
Following this, global average pooling is applied to the gradients to derive weights for
each feature map channel. These weights represent the importance of each channel in the
prediction for the target class. A weighted sum of the feature maps is then computed using
these weights, resulting in an activation map that highlights the contributions of different
regions of the image.
28 Paper Work: Deep Visual Analytics Domain
To create the visual explanation, the activation map is passed through a ReLU (Rectified
Linear Unit) activation function to emphasize positive contributions. This map is then
resized to match the dimensions of the input image and overlaid on it as a heatmap. This
heatmap visually demonstrates which parts of the image were most influential in the model’s
decision, providing a clear and interpretable representation of the model’s focus areas during
prediction.
• Identifying Bias in Datasets: Grad-CAM can be used to identify and address biases
in datasets by visualizing how different features or demographic groups affect the
model’s predictions. By examining the heatmaps, researchers can uncover if certain
4.6 Conclusion 29
groups are disproportionately influencing the model, which helps in understanding and
mitigating dataset biases.
• Visual Question Answering (VQA): For visual question answering systems, Grad-
CAM provides insight into how the model answers questions based on image content.
By visualizing the parts of the image that are considered relevant for answering specific
questions, Grad-CAM helps in understanding the model’s reasoning and improves the
accuracy of the answers provided.
4.6 Conclusion
In this paper, proposed a novel class-discriminative localization technique called Grad-CAM
(Gradient-weighted Class Activation Mapping). Grad-CAM is designed to enhance the
interpretability of convolutional neural networks (CNNs) by providing visual explanations of
their predictions. Unlike many existing methods, Grad-CAM operates with any CNN-based
architecture without requiring modifications to the network, making it highly adaptable and
broadly applicable.
Results demonstrate that Grad-CAM outperforms existing approaches in terms of both
interpretability and faithfulness. It effectively highlights the most relevant regions of an
input image that influence the model’s decision, offering clear and intuitive visualizations
30 Paper Work: Deep Visual Analytics Domain
that improve understanding of the model’s behavior. This advancement not only aids in
debugging and refining deep learning models but also supports more transparent and reliable
machine learning applications.
References
[1] Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., and Batra, D. (2017).
Grad-cam: Visual explanations from deep networks via gradient-based localization. In
Proceedings of the IEEE international conference on computer vision, pages 618–626.