0% found this document useful (0 votes)
12 views33 pages

ML Copy

Uploaded by

atik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views33 pages

ML Copy

Uploaded by

atik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Table of contents

1 Machine Learning 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Iris Dataset Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Machine Learning Algorithm and Its Implementation 3


2.1 K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.2 Implementation in Python . . . . . . . . . . . . . . . . . . . . . . 3
2.2 KNN Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.2 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.2 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.2 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5.2 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.6 Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.6.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6.2 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
ii Table of contents

2.7 Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12


2.7.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.7.2 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.7.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Problem on CART and ID3 Algorithm 14


3.1 CART: Classification and Regression Tree . . . . . . . . . . . . . . . . . . 14
3.1.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.2 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 ID3 :Iterative Dichotomiser . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.2 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 Paper Work: Deep Visual Analytics Domain 26


4.1 What is Deep Visual Analytics . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2 Paper on Deep Visual Analytic Domain . . . . . . . . . . . . . . . . . . . 26
4.3 Grad-CAM: Visual Explanations from Deep Networks via Gradient-based
Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5 Applications and Use Cases of Grad-CAM . . . . . . . . . . . . . . . . . . 28
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

References 31
Chapter 1

Machine Learning

1.1 Introduction
The primary objective of this lab report is to explore, implement, and evaluate a diverse
range of machine learning algorithms to understand their effectiveness and practical applica-
tions. The algorithms under investigation include K-Means Clustering, K-Nearest Neighbors
(KNN), Linear Regression, Logistic Regression, Support Vector Machines (SVM), Naive
Bayes, and Decision Trees. Each of these techniques is designed to tackle specific types of
problems, and their performance can vary depending on the data and task they are applied to.
It should be mentioned that the algorithm is implemented on a single dataset called the Iris
dataset.

In addition to these core algorithms, the report will also delve into the mathematical foun-
dations of Classification and Regression Trees (CART) and Iterative Dichotomiser 3 (ID3).
Understanding the underlying mathematics of these techniques is crucial for grasping their
operational mechanics and evaluating their effectiveness. Furthermore, the theoretical prin-
ciples behind Support Vector Machines (SVM) will be examined in detail to provide a
comprehensive understanding of how this powerful algorithm functions.

The report is structured into three main parts. The first part focuses on implementing
the selected algorithms on the Iris dataset, evaluating their performance, and analyzing their
strengths and limitations in various scenarios. The second part provides a theoretical explo-
ration of the mathematical principles underlying CART, ID3, and SVM, offering insights into
the core concepts that drive these algorithms. The third part involves a review of a relevant
paper on Deep Visual Analytics, which will help contextualize the findings within current
2 Machine Learning

research and highlight best practices and emerging trends in the field.

1.2 Iris Dataset Description


For the purposes of this analysis, the well-known Iris dataset offers an excellent choice
due to its simplicity and clarity. This dataset consists of 150 samples of iris flowers, each
characterized by four distinct features: sepal length, sepal width, petal length, and petal width.
These features provide a comprehensive description of the flowers, which are categorized
into three distinct classes: setosa, versicolor, and virginica.
The Iris dataset is particularly useful for several reasons. First, its small size makes it
manageable and accessible for performing a variety of machine learning tasks without
the need for extensive computational resources. This allows for a focused examination of
different algorithms and their performance on a controlled dataset. Second, the dataset is
well-balanced with an equal number of samples from each class, which helps in evaluating
classification algorithms without the bias that can arise from imbalanced data.

Fig. 1.1 Iris Dataset


Chapter 2

Machine Learning Algorithm and Its


Implementation

2.1 K-Means Clustering


K-Means is an unsupervised learning algorithm used for clustering data points into a prede-
fined number of groups (K). It works iteratively to assign data points to one of K clusters
based on minimizing the distance between each point and the centroid of the cluster.

2.1.1 Algorithm
• Initialize K centroids randomly.

• Assign each data point to the closest centroid.

• Update the centroids by calculating the mean of the points assigned to each cluster.

• Repeat steps 2 and 3 until convergence (i.e., centroids do not change significantly.)

2.1.2 Implementation in Python


1 # 1. K - Means Clustering
2 from sklearn . cluster import KMeans
3 import matplotlib . pyplot as plt
4

5 # Assuming X is your dataset


6 kmeans = KMeans ( n_clusters =3 , random_state =42)
7 kmeans . fit ( X )
8 y_kmeans = kmeans . predict ( X )
4 Machine Learning Algorithm and Its Implementation

10 # Visualize K - Means Clustering


11 plt . figure ( figsize =(8 , 6) )
12 plt . scatter ( X [: , 0] , X [: , 1] , c = y_kmeans , cmap = ’ viridis ’)
13 plt . title ( ’K - Means ␣ Clustering ’)
14 plt . xlabel ( ’ Feature ␣ 1 ’)
15 plt . ylabel ( ’ Feature ␣ 2 ’)
16 plt . show ()

Fig. 2.1 K-means Clustering

2.2 KNN Algorithm


KNN is a simple, non-parametric classification algorithm that classifies a new data point
based on the majority class among its K nearest neighbors in the feature space.

2.2.1 Algorithm
• Choose the number of neighbors K.
• Compute the distance between the new data point and all training data points.
• Identify the K nearest neighbors based on the distance.
• Assign the new data point to the class that is most common among its K neighbors.
2.2 KNN Algorithm 5

2.2.2 Code
1 # 2. K - Nearest Neighbors ( KNN )
2 knn = K N e i g h b or s C l a s s i f i e r ( n_neighbors =3)
3 knn . fit ( X_train , y_train )
4 y_pred_knn = knn . predict ( X_test )
5 accuracy_knn = accuracy_score ( y_test , y_pred_knn )
6 print ( f ’ KNN ␣ Accuracy : ␣ { accuracy_knn ␣ * ␣ 100:.2 f }% ’)
7

8 # Visualize K - Nearest Neighbors


9 plt . figure ( figsize =(8 , 6) )
10 plt . scatter ( X_test [: , 0] , X_test [: , 1] , c = y_pred_knn , cmap = ’ coolwarm ’
)
11 plt . title ( ’K - Nearest ␣ Neighbors ␣ ( KNN ) ’)
12 plt . xlabel ( ’ Feature ␣ 1 ’)
13 plt . ylabel ( ’ Feature ␣ 2 ’)
14 plt . show ()

2.2.3 Result

Fig. 2.2 Knn Result


6 Machine Learning Algorithm and Its Implementation

2.3 Linear Regression


Linear Regression is a supervised learning algorithm used for predicting continuous outcomes.
It assumes a linear relationship between the input features and the output.

2.3.1 Algorithm
• Define the hypothesis as a linear equation: y = β0 + β1 x1 + β2 x2 + · · · + βn xn
• Use a cost function (Mean Squared Error) to measure the error between the
predicted values and actual values.
• Use optimization techniques (like Gradient Descent) to minimize the cost function
and update the weights β0 , β1 , . . . , βn .
• Once the cost function is minimized, use the learned weights to make predictions.
2.3 Linear Regression 7

2.3.2 Code
1 # 3. Linear Regression ( predict one feature , here feature 0)
2 linear_reg = LinearRegression ()
3 linear_reg . fit ( X_train [: , 0]. reshape ( -1 , 1) , X_train [: , 1])
4 y_pred_lr = linear_reg . predict ( X_test [: , 0]. reshape ( -1 , 1) )
5

6 # Plot Linear Regression


7 plt . figure ( figsize =(8 , 6) )
8 plt . scatter ( X_test [: , 0] , X_test [: , 1] , color = ’ blue ’)
9 plt . plot ( X_test [: , 0] , y_pred_lr , color = ’ red ’)
10 plt . title ( ’ Linear ␣ Regression ’)
11 plt . xlabel ( ’ Sepal ␣ Length ’)
12 plt . ylabel ( ’ Sepal ␣ Width ’)
13 plt . show ()

2.3.3 Result

Fig. 2.3 Linear Regression Result


8 Machine Learning Algorithm and Its Implementation

2.4 Logistic Regression


Logistic Regression is a supervised learning algorithm used for binary classification problems.
It uses a logistic function to model the probability of a data point belonging to a particular
class.

2.4.1 Algorithm
1
• Define the hypothesis using the sigmoid function: P(y = 1 | x) =
1+e−(β0 +β1 x1 +···+βn xn )
• Use a loss function (Log-Loss) to measure the error in classification.
• Apply optimization techniques (like Gradient Descent) to minimize the loss and
update the weights.
• After minimizing the loss, use the probability score to classify new data points
based on a threshold (e.g., 0.5).

2.4.2 Code
1 # 4. Logistic Regression
2 log_reg = Lo gi st icR eg re ssi on ( max_iter =200)
3 log_reg . fit ( X_train [: , [0 , 1]] , y_train )
4 y_pred_logreg = log_reg . predict ( X_test [: , [0 , 1]])
5 accuracy_logreg = accuracy_score ( y_test , y_pred_logreg )
6 print ( f ’ Logistic ␣ Regression ␣ Accuracy : ␣ { accuracy_logreg ␣ * ␣ 100:.2 f }% ’)
7

8 # Visualize Logistic Regression with decision boundary


9 X_set , y_set = X_train [: , [0 , 1]] , y_train
10 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 1 , stop =
X_set [: , 0]. max () + 1 , step = 0.01) ,
11 np . arange ( start = X_set [: , 1]. min () - 1 , stop =
X_set [: , 1]. max () + 1 , step = 0.01) )
12 plt . contourf ( X1 , X2 , log_reg . predict ( np . array ([ X1 . ravel () , X2 . ravel ()
]) . T ) . reshape ( X1 . shape ) ,
13 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’ , ’
blue ’) ) )
14 plt . scatter ( X_train [: , 0] , X_train [: , 1] , c = y_train , cmap =
ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) ) )
15 plt . title ( ’ Logistic ␣ Regression ␣ ( Decision ␣ Boundary ) ’)
16 plt . xlabel ( ’ Feature ␣ 1 ’)
17 plt . ylabel ( ’ Feature ␣ 2 ’)
18 plt . show ()
2.5 Support Vector Machine 9

2.4.3 Result

Fig. 2.4 Caption

2.5 Support Vector Machine


SVM is a powerful supervised learning algorithm for classification tasks. It works by finding
a hyperplane that best separates the data points of different classes with the maximum margin.

2.5.1 Algorithm
• Define a hyperplane that separates the data points in the feature space.
• Maximize the margin between the hyperplane and the nearest data points from
each class (support vectors).
• Use optimization techniques to solve the problem and find the optimal hyperplane.
• Classify new data points based on which side of the hyperplane they fall.

2.5.2 Code
1 # 5. Support Vector Machine ( SVM )
2 svm = SVC ( kernel = ’ linear ’ , random_state =42)
3 svm . fit ( X_train [: , [0 , 1]] , y_train )
4 y_pred_svm = svm . predict ( X_test [: , [0 , 1]])
10 Machine Learning Algorithm and Its Implementation

5 accuracy_svm = accuracy_score ( y_test , y_pred_svm )


6 print ( f ’ SVM ␣ Accuracy : ␣ { accuracy_svm ␣ * ␣ 100:.2 f }% ’)
7

8 # Visualize SVM with decision boundary


9 plt . contourf ( X1 , X2 , svm . predict ( np . array ([ X1 . ravel () , X2 . ravel () ]) . T
) . reshape ( X1 . shape ) ,
10 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’ , ’
blue ’) ) )
11 plt . scatter ( X_train [: , 0] , X_train [: , 1] , c = y_train , cmap =
ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) ) )
12 plt . title ( ’ Support ␣ Vector ␣ Machine ␣ ( SVM ) ␣ Decision ␣ Boundary ’)
13 plt . xlabel ( ’ Feature ␣ 1 ’)
14 plt . ylabel ( ’ Feature ␣ 2 ’)
15 plt . show ()

2.5.3 Result

Fig. 2.5 Support Vector Machanine Result

2.6 Naive Bayes


Naive Bayes is a probabilistic classification algorithm based on Bayes’ Theorem. It assumes
that the features are independent, making it "naive," but it works well for certain types of
classification tasks like text classification.
2.6 Naive Bayes 11

2.6.1 Algorithm
• Calculate the prior probability for each class.
• Calculate the likelihood of the input features for each class.
• Apply Bayes’ Theorem to compute the posterior probability for each class:

P(X | Class) · P(Class)


P(Class | X) =
P(X)

• Assign the class with the highest posterior probability to the new data point.

2.6.2 Code
1 # 6. Naive Bayes
2 nb = GaussianNB ()
3 nb . fit ( X_train [: , [0 , 1]] , y_train )
4 y_pred_nb = nb . predict ( X_test [: , [0 , 1]])
5 accuracy_nb = accuracy_score ( y_test , y_pred_nb )
6 print ( f ’ Naive ␣ Bayes ␣ Accuracy : ␣ { accuracy_nb ␣ * ␣ 100:.2 f }% ’)
7

8 # Visualize Naive Bayes with decision boundary


9 plt . contourf ( X1 , X2 , nb . predict ( np . array ([ X1 . ravel () , X2 . ravel () ]) . T )
. reshape ( X1 . shape ) ,
10 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’ , ’
blue ’) ) )
11 plt . scatter ( X_train [: , 0] , X_train [: , 1] , c = y_train , cmap =
ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) ) )
12 plt . title ( ’ Naive ␣ Bayes ␣ ( Decision ␣ Boundary ) ’)
13 plt . xlabel ( ’ Feature ␣ 1 ’)
14 plt . ylabel ( ’ Feature ␣ 2 ’)
15 plt . show ()
12 Machine Learning Algorithm and Its Implementation

2.6.3 Result

Fig. 2.6 Caption

2.7 Decision Tree


A Decision Tree is a supervised learning algorithm that uses a tree-like model of decisions
and their possible outcomes. It works by recursively splitting the data based on the features
that provide the maximum information gain.

2.7.1 Algorithm
• Select the best feature to split the data using metrics like Gini impurity or infor-
mation gain.
• Split the dataset into subsets based on the selected feature.
• Recursively repeat the process for each subset until the stopping criterion is met
(e.g., maximum depth or minimum samples per leaf).
• Classify new data points by traversing the tree based on feature values.

2.7.2 Code
1 # 7. Decision Tree
2 tree = D e c i s i o n T r e e C l a s s i f i e r ( random_state =42)
2.7 Decision Tree 13

3 tree . fit ( X_train [: , [0 , 1]] , y_train )


4 y_pred_tree = tree . predict ( X_test [: , [0 , 1]])
5 accuracy_tree = accuracy_score ( y_test , y_pred_tree )
6 print ( f ’ Decision ␣ Tree ␣ Accuracy : ␣ { accuracy_tree ␣ * ␣ 100:.2 f }% ’)
7

8 # Visualize Decision Tree with decision boundary


9 plt . contourf ( X1 , X2 , tree . predict ( np . array ([ X1 . ravel () , X2 . ravel () ]) .
T ) . reshape ( X1 . shape ) ,
10 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’ , ’
blue ’) ) )
11 plt . scatter ( X_train [: , 0] , X_train [: , 1] , c = y_train , cmap =
ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) ) )
12 plt . title ( ’ Decision ␣ Tree ␣ ( Decision ␣ Boundary ) ’)
13 plt . xlabel ( ’ Feature ␣ 1 ’)
14 plt . ylabel ( ’ Feature ␣ 2 ’)
15 plt . show ()

2.7.3 Result

Fig. 2.7 Caption


Chapter 3

Problem on CART and ID3 Algorithm

3.1 CART: Classification and Regression Tree


It is used for generating both classification tree and regression tree. It uses Gini index as
metric/cost function to evaluate split in feature selection It is used for binary classification. It
use least square as a metric to select features in case of Regression tree.

3.1.1 Problem
Lets start with weather data set, which is quite famous in explaining decision tree algo-
rithm,where target is to predict play or not( Yes or No) based on weather condition.

Fig. 3.1 Data


3.1 CART: Classification and Regression Tree 15

3.1.2 Solution
From data, outlook, temperature, humidity, wind are the features of data. So, lets start
building tree, Outlook is a nominal feature. it can take three value, sunny, overcast and rain.
Lets summarize the final decision for outlook features,

Fig. 3.2 Outlook features

Gini index calculated by subtracting the sum of square probability of each class:
n
Gini = 1 − ∑ pi
i=1

Gini index (outlook=sunny)= 1-(2/5)²-(3/5)² = 1- 0.16–0.36 = 0.48


Gini index(outlook=overcast)= 1- (4/4)²-(0/4)² = 1- 1- 0 = 0
Gini index(outlook=rainfall)= 1- (3/5)² -(2/5)² = 1- 0.36- 0.16 = 0.48
Now , we will calculate the weighted sum of Gini index for outlook features,
Gini(outlook) = (5/14)0.48 + (4/14) *0 + (5/14)0.48 = 0.342
Similarly, temperature is also a nominal feature, it can take three values, hot,cold and mild.
lets summarize the final decision of temperature feature.

Fig. 3.3 Temperature features

Gini(temperature=hot) = 1-(2/4)²-(2/4)² = 0.5


Gini(temperature=cool) = 1-(3/4)²-(1/4)² = 0.375
Gini(temperature=mild) = 1-(4/6)²-(2/6)² = 0.445
Now, the weighted sum of Gini index for temperature features can be calculated as,
16 Problem on CART and ID3 Algorithm

Gini(temperature)= (4/14) *0.5 + (4/14) *0.375 + (6/14) *0.445 =0.439

Similarly for Humidity.

Fig. 3.4 Humidity features

Humidity is a binary class feature , it can take two value high and normal.
Gini(humidity=high) = 1-(3/7)²-(4/7)² = 0.489
Gini(humidity=normal) = 1-(6/7)²-(1/7)² = 0.244
Now, the weighted sum of Gini index for humidity features can be calculated as,
Gini(humidity) = (7/14) *0.489 + (7/14) *0.244=0.367.

Fig. 3.5 Wind features

wind is a binary class feature , it can take two value weak and strong.
Gini(wind=weak)= 1-(6/8)²-(2/8)² = 0.375
Gini(wind=strong)= 1-(3/6)²-(3/6)²= 0.5
Now, the weighted sum of Gini index for wind features can be calculated as,
Gini(wind) = (8/14) *0.375 + (6/14) *0.5=0.428
So,the final decision of all the features,
3.1 CART: Classification and Regression Tree 17

Fig. 3.6 Decision features

From table, we can seen that Gini index for outlook feature is lowest. So we get our root
node.

Fig. 3.7 Decision tree

Now, lets focus on sub data on sunny outlook feature. we need to find the Gini index for
temperature, humidity and wind feature respectively.
18 Problem on CART and ID3 Algorithm

Fig. 3.8 Data on temperature, humidity and wind

Gini index for temperature on sunny outlook

Fig. 3.9 Temparature fature

Gini(outlook=sunny temperature=hot) = 1-(0/2)²-(2/2)² = 0


Gini(outlook=sunny temperature=cool) = 1-(1/1)²-(0/1)² = 0
Gini(outlook=sunny temperature=mild) = 1-(1/2)²-(1/2)² = 0.5
Now, the weighted sum of Gini index for temperature on sunny outlook features can be
calculated as,
Gini(outlook=sunny temperature)= (2/5) *0 + (1/5) *0+ (2/5) *0.5 =0.2
Gini Index for humidity on sunny outlook.
3.1 CART: Classification and Regression Tree 19

Fig. 3.10 Humidity feature

Gini(outlook=sunny humidity=high) = 1-(0/3)²-(3/3)² = 0


Gini(outlook=sunny humidity=normal) = 1-(2/2)²-(0/2)² = 0
Now, the weighted sum of Gini index for humidity on sunny outlook features can be calcu-
lated as,
Gini(outlook = sunny humidity) = (3/5) *0 + (2/5) *0=0
Gini Index for wind on sunny outlook

Fig. 3.11 Sunny feature

Gini(outlook=sunny wind=weak) = 1-(1/3)²-(2/3)² = 0.44


Gini(outlook=sunny wind=strong) = 1-(1/2)²-(1/2)² = 0.5
Now, the weighted sum of Gini index for wind on sunny outlook features can be calculated
as,
Gini(outlook = sunny and wind) = (3/5) *0.44 + (2/5) *0.5=0.266+0.2= 0.466.

Decision on sunny outlook factor


20 Problem on CART and ID3 Algorithm

Fig. 3.12 Decision features

we have calculated the Gini index of all the features when the outlook is sunny. You can
infer that humidity has lowest value. so next node will be humidity.

Fig. 3.13 Decision tree

Now,Lets focus on sub data for overcast outlook feature. and calculate till all dataset split
is a same manner.
3.2 ID3 :Iterative Dichotomiser 21

Fig. 3.14 Final Decision Tree

So, the Final Decision Tree will be like above tree.

3.2 ID3 :Iterative Dichotomiser


D3(Iterative Dichotomiser 3): This solution uses Entropy and Information gain as metrics to
form a better decision tree. The attribute with the highest information gain is used as a root
node, and a similar approach is followed after that Entropy varies from 0 to 1. 0 if all the
data belong to a single class and 1 if the class distribution is equal. In this way, entropy will
give a measure of impurity in the dataset.
Steps to decide which attribute to split:

• Compute the entropy for the dataset

• For every attribute:

– Calculate entropy for all categorical values.


– Take average information entropy for the attribute.
– Calculate gain for the current attribute

• Pick the attribute with the highest information gain.

• Repeat until we get the desired tree


22 Problem on CART and ID3 Algorithm

Formula: Entropy
c
H(S) = − ∑ pi log2 pi
i=1

The information gain Gain(S, A) of an attribute A is given by:

|Sv |
Gain(S, A) = H(S) − ∑ H(Sv )
v∈Values(A)
|S|

3.2.1 Problem
Lets start with weather data set, which is quite famous in explaining decision tree algo-
rithm,where target is to predict play or not( Yes or No) based on weather condition.

Fig. 3.15 Data

3.2.2 Solution
From the data we can observe that Number of observations = 14
Number of observations having Decision ‘Yes’ = 9
probability of ‘Yes’ , p(Yes) = 9/14
3.2 ID3 :Iterative Dichotomiser 23

Number of observations having Decision ‘No’ =5


probability of ‘No’ , p(No) = 5/14
As we have four attribute, outlook, temperature, humidity, and wind

Fig. 3.16 Information Gain on Sunny outlook factor

Number of instance sunny outlook factor = 5


Decision = ‘yes’,prob(Decision = ‘yes’ |outlook =sunny) = 2/5
Decision = ‘No’,prob(Decision = ‘No’ |outlook =sunny) = 3/5
Entropy(Decision |outlook =sunny)= -(2/5)*log(2/5)-(3/5)*log(3/5)=0.97
Entropy(Decision |outlook =overcast)= -(4/4)*log(4/4)= 0
Entropy(Decision|outlook =rainfall)=- (3/5)*log(3/5)-(2/5)*log(2/5)= 0.97
Gain(Decision,outlook) = Entropy(Decision)-P(Decision |outlook =sunny)*log P(Decision
|outlook =sunny)-P(Decision |outlook =overcast)*log P(Decision |outlook =overcast)-P(Decision|outlook
=rainfall)*log P(Decision|outlook =rainfall)
Gain(Decision,outlook) = 0.97- (5/14)*0.97-(4/4)*0-(5/14)*0.97=0.247
Summary of Information Gain for all the attribute
Gain(Decision, outlook) = 0.247
Gain(Decision, wind ) = 0.048
Gain(Decision, temperature) = 0.029
24 Problem on CART and ID3 Algorithm

Gain(Decision, humidity) = 0.151


So, outlook has the highest information gain so it is selected as first node/ root node.

Fig. 3.17 Information Gain on Temperature under Sunny outlook factor

Entropy(sunny|temp =hot)=- (0/2)*log(0/2)-(2/2)*log(2/2)= 0


Entropy(sunny|temp =cool)=- (1/1)*log(1/1)-(0/1)*log(0/1)= 0
Entropy(sunny|temp =mild)=- (1/2)*log(1/2)-(1/2)*log(1/2)= 1
Gain(sunny,temp) = 0.97- (2/5)*0 -(1/5)*0 -(2/5)*1=0.57
Summary of information gain on all attribute under Sunny outlook factor
Gain(sunny, temp) = 0.57
Gain(sunny, humidity) = 0.97
Gain(sunny, wind) =0.019

Fig. 3.18 Caption


3.2 ID3 :Iterative Dichotomiser 25

Fig. 3.19 Caption

we can see that information gain of ‘Humidity’ attribute is higher than other. so, it is next
node under sunny outlook factor.

Humidity takes two value, normal and high. From the both table, you can infer that, whenever
humidity is high decision is ‘No’. And, when humidity is normal decision is ‘Yes’ Now,Lets
focus on other sub data feature. and calculate till all dataset split is a same manner. So, final
decision tree will be:

Fig. 3.20 Caption


Chapter 4

Paper Work: Deep Visual Analytics


Domain

4.1 What is Deep Visual Analytics


Deep Visual Analytics is an advanced field that combines the power of deep learning with
visual analytics to interpret and understand complex datasets. By harnessing deep learning
models, particularly neural networks, Deep Visual Analytics can automatically uncover
intricate patterns and features within large volumes of data. These models, known for their
ability to handle diverse data types such as images, text, and time series, generate detailed
insights that are often difficult to interpret directly.

4.2 Paper on Deep Visual Analytic Domain


To bridge this gap, Deep Visual Analytics integrates these insights with visualization tech-
niques, creating intuitive and interactive representations of the data. This combination allows
users to explore and understand the results in a more accessible way. Techniques like Grad-
CAM (Gradient-weighted Class Activation Mapping) are particularly valuable in this context.
Grad-CAM enhances the interpretability of convolutional neural networks (CNNs) by gener-
ating visual explanations of their predictions. It works by producing heatmaps that highlight
the regions of an input image that are most influential in the model’s decision-making process
27
4.3 Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

4.3 Grad-CAM: Visual Explanations from Deep Networks


via Gradient-based Localization
Deep neural networks, especially convolutional neural networks (CNNs), are powerful
tools in various applications. However, they often function as black boxes, providing little
insight into their decision-making processes. This lack of interpretability can hinder their
adoption, especially in critical areas where understanding model reasoning is essential.
This paper discusses the importance of interpretability in CNNs and presents Grad-CAM
(Gradient-weighted Class Activation Mapping) as an effective technique for generating visual
explanations from CNN-based models. The full paper is included here [1]

4.4 Methodology
Grad-CAM (Gradient-weighted Class Activation Mapping) provides a method for visualizing
the decisions made by convolutional neural networks (CNNs) by generating heatmaps that
highlight the most influential regions of an input image.
The process begins with a forward pass through the CNN, where the input image is pro-
cessed to produce predictions. During this pass, feature maps are extracted from the last
convolutional layer, which precedes the final classification layer. In the backward pass, the
gradients of the score for the target class with respect to these feature maps are computed.
These gradients indicate how changes in the feature maps affect the score of the predicted
class.
Following this, global average pooling is applied to the gradients to derive weights for
each feature map channel. These weights represent the importance of each channel in the
prediction for the target class. A weighted sum of the feature maps is then computed using
these weights, resulting in an activation map that highlights the contributions of different
regions of the image.
28 Paper Work: Deep Visual Analytics Domain

Fig. 4.1 Working Procedure of Grad Cam

To create the visual explanation, the activation map is passed through a ReLU (Rectified
Linear Unit) activation function to emphasize positive contributions. This map is then
resized to match the dimensions of the input image and overlaid on it as a heatmap. This
heatmap visually demonstrates which parts of the image were most influential in the model’s
decision, providing a clear and interpretable representation of the model’s focus areas during
prediction.

4.5 Applications and Use Cases of Grad-CAM


• Image Captioning: Grad-CAM can be applied to image captioning systems to visualize
which parts of an image are most influential in generating specific descriptions. By
highlighting these areas, Grad-CAM helps in understanding how the model interprets
and describes different features in an image, leading to better insights and potential
improvements in captioning accuracy.

• Diagnosing Image Classification CNNs: For image classification tasks, Grad-CAM


helps diagnose issues by providing a visual explanation of which regions of an image
are influencing the model’s classification decisions. This can reveal if the model is
focusing on irrelevant features or if it is over-relying on certain patterns, allowing for
targeted adjustments and enhancements.

• Identifying Bias in Datasets: Grad-CAM can be used to identify and address biases
in datasets by visualizing how different features or demographic groups affect the
model’s predictions. By examining the heatmaps, researchers can uncover if certain
4.6 Conclusion 29

groups are disproportionately influencing the model, which helps in understanding and
mitigating dataset biases.

• Model Debugging and Improvement: When models produce unexpected or incorrect


results, Grad-CAM provides valuable insights by showing which parts of the input data
are being focused on. This can aid in debugging by identifying whether the model’s
attention is misplaced or if certain data aspects are misunderstood, facilitating model
improvement and refinement.

• Biomedical Image Analysis: In the field of biomedical image analysis, Grad-CAM


helps clinicians by visualizing which regions of medical images, such as MRIs or
X-rays, are most relevant to the model’s diagnostic conclusions. This enhances the
interpretability of the model’s predictions, supporting clinical decision-making and
improving trust in automated diagnostics.

• Visual Question Answering (VQA): For visual question answering systems, Grad-
CAM provides insight into how the model answers questions based on image content.
By visualizing the parts of the image that are considered relevant for answering specific
questions, Grad-CAM helps in understanding the model’s reasoning and improves the
accuracy of the answers provided.

In summary, Grad-CAM is a versatile tool that enhances interpretability across various


applications, including image captioning, image classification, dataset bias identification,
model debugging, biomedical analysis, and visual question answering. Its ability to provide
visual explanations of model decisions makes it a valuable asset in understanding and
improving deep learning models.

4.6 Conclusion
In this paper, proposed a novel class-discriminative localization technique called Grad-CAM
(Gradient-weighted Class Activation Mapping). Grad-CAM is designed to enhance the
interpretability of convolutional neural networks (CNNs) by providing visual explanations of
their predictions. Unlike many existing methods, Grad-CAM operates with any CNN-based
architecture without requiring modifications to the network, making it highly adaptable and
broadly applicable.
Results demonstrate that Grad-CAM outperforms existing approaches in terms of both
interpretability and faithfulness. It effectively highlights the most relevant regions of an
input image that influence the model’s decision, offering clear and intuitive visualizations
30 Paper Work: Deep Visual Analytics Domain

that improve understanding of the model’s behavior. This advancement not only aids in
debugging and refining deep learning models but also supports more transparent and reliable
machine learning applications.
References

[1] Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., and Batra, D. (2017).
Grad-cam: Visual explanations from deep networks via gradient-based localization. In
Proceedings of the IEEE international conference on computer vision, pages 618–626.

You might also like