0% found this document useful (0 votes)

12 views33 pages

ML Copy

Uploaded by

atik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views33 pages

ML Copy

Uploaded by

atik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

1 Machine Learning 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Iris Dataset Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Machine Learning Algorithm and Its Implementation 3

2.1 K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.2 Implementation in Python . . . . . . . . . . . . . . . . . . . . . . 3
2.2 KNN Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.2 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.2 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.2 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5.2 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.6 Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.6.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6.2 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
ii Table of contents

2.7 Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.7.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.7.2 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.7.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Problem on CART and ID3 Algorithm 14

3.1 CART: Classification and Regression Tree . . . . . . . . . . . . . . . . . . 14
3.1.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.2 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 ID3 :Iterative Dichotomiser . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.2 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 Paper Work: Deep Visual Analytics Domain 26

4.1 What is Deep Visual Analytics . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2 Paper on Deep Visual Analytic Domain . . . . . . . . . . . . . . . . . . . 26
4.3 Grad-CAM: Visual Explanations from Deep Networks via Gradient-based
Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5 Applications and Use Cases of Grad-CAM . . . . . . . . . . . . . . . . . . 28
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

References 31
Chapter 1

Machine Learning

1.1 Introduction
The primary objective of this lab report is to explore, implement, and evaluate a diverse
range of machine learning algorithms to understand their effectiveness and practical applica-
tions. The algorithms under investigation include K-Means Clustering, K-Nearest Neighbors
(KNN), Linear Regression, Logistic Regression, Support Vector Machines (SVM), Naive
Bayes, and Decision Trees. Each of these techniques is designed to tackle specific types of
problems, and their performance can vary depending on the data and task they are applied to.
It should be mentioned that the algorithm is implemented on a single dataset called the Iris
dataset.

In addition to these core algorithms, the report will also delve into the mathematical foun-
dations of Classification and Regression Trees (CART) and Iterative Dichotomiser 3 (ID3).
Understanding the underlying mathematics of these techniques is crucial for grasping their
operational mechanics and evaluating their effectiveness. Furthermore, the theoretical prin-
ciples behind Support Vector Machines (SVM) will be examined in detail to provide a
comprehensive understanding of how this powerful algorithm functions.

The report is structured into three main parts. The first part focuses on implementing
the selected algorithms on the Iris dataset, evaluating their performance, and analyzing their
strengths and limitations in various scenarios. The second part provides a theoretical explo-
ration of the mathematical principles underlying CART, ID3, and SVM, offering insights into
the core concepts that drive these algorithms. The third part involves a review of a relevant
paper on Deep Visual Analytics, which will help contextualize the findings within current
2 Machine Learning

research and highlight best practices and emerging trends in the field.

1.2 Iris Dataset Description

For the purposes of this analysis, the well-known Iris dataset offers an excellent choice
due to its simplicity and clarity. This dataset consists of 150 samples of iris flowers, each
characterized by four distinct features: sepal length, sepal width, petal length, and petal width.
These features provide a comprehensive description of the flowers, which are categorized
into three distinct classes: setosa, versicolor, and virginica.
The Iris dataset is particularly useful for several reasons. First, its small size makes it
manageable and accessible for performing a variety of machine learning tasks without
the need for extensive computational resources. This allows for a focused examination of
different algorithms and their performance on a controlled dataset. Second, the dataset is
well-balanced with an equal number of samples from each class, which helps in evaluating
classification algorithms without the bias that can arise from imbalanced data.

Fig. 1.1 Iris Dataset

Chapter 2

Machine Learning Algorithm and Its

Implementation

2.1 K-Means Clustering

K-Means is an unsupervised learning algorithm used for clustering data points into a prede-
fined number of groups (K). It works iteratively to assign data points to one of K clusters
based on minimizing the distance between each point and the centroid of the cluster.

2.1.1 Algorithm
• Initialize K centroids randomly.

• Assign each data point to the closest centroid.

• Update the centroids by calculating the mean of the points assigned to each cluster.

• Repeat steps 2 and 3 until convergence (i.e., centroids do not change significantly.)

2.1.2 Implementation in Python

1 # 1. K - Means Clustering
2 from sklearn . cluster import KMeans
3 import matplotlib . pyplot as plt
4

5 # Assuming X is your dataset

6 kmeans = KMeans ( n_clusters =3 , random_state =42)
7 kmeans . fit ( X )
8 y_kmeans = kmeans . predict ( X )
4 Machine Learning Algorithm and Its Implementation

10 # Visualize K - Means Clustering

11 plt . figure ( figsize =(8 , 6) )
12 plt . scatter ( X [: , 0] , X [: , 1] , c = y_kmeans , cmap = ’ viridis ’)
13 plt . title ( ’K - Means ␣ Clustering ’)
14 plt . xlabel ( ’ Feature ␣ 1 ’)
15 plt . ylabel ( ’ Feature ␣ 2 ’)
16 plt . show ()

Fig. 2.1 K-means Clustering

2.2 KNN Algorithm

KNN is a simple, non-parametric classification algorithm that classifies a new data point
based on the majority class among its K nearest neighbors in the feature space.

2.2.1 Algorithm
• Choose the number of neighbors K.
• Compute the distance between the new data point and all training data points.
• Identify the K nearest neighbors based on the distance.
• Assign the new data point to the class that is most common among its K neighbors.
2.2 KNN Algorithm 5

2.2.2 Code
1 # 2. K - Nearest Neighbors ( KNN )
2 knn = K N e i g h b or s C l a s s i f i e r ( n_neighbors =3)
3 knn . fit ( X_train , y_train )
4 y_pred_knn = knn . predict ( X_test )
5 accuracy_knn = accuracy_score ( y_test , y_pred_knn )
6 print ( f ’ KNN ␣ Accuracy : ␣ { accuracy_knn ␣ * ␣ 100:.2 f }% ’)
7

8 # Visualize K - Nearest Neighbors

9 plt . figure ( figsize =(8 , 6) )
10 plt . scatter ( X_test [: , 0] , X_test [: , 1] , c = y_pred_knn , cmap = ’ coolwarm ’
)
11 plt . title ( ’K - Nearest ␣ Neighbors ␣ ( KNN ) ’)
12 plt . xlabel ( ’ Feature ␣ 1 ’)
13 plt . ylabel ( ’ Feature ␣ 2 ’)
14 plt . show ()

2.2.3 Result

Fig. 2.2 Knn Result

6 Machine Learning Algorithm and Its Implementation

2.3 Linear Regression

Linear Regression is a supervised learning algorithm used for predicting continuous outcomes.
It assumes a linear relationship between the input features and the output.

2.3.1 Algorithm
• Define the hypothesis as a linear equation: y = β0 + β1 x1 + β2 x2 + · · · + βn xn
• Use a cost function (Mean Squared Error) to measure the error between the
predicted values and actual values.
• Use optimization techniques (like Gradient Descent) to minimize the cost function
and update the weights β0 , β1 , . . . , βn .
• Once the cost function is minimized, use the learned weights to make predictions.
2.3 Linear Regression 7

2.3.2 Code
1 # 3. Linear Regression ( predict one feature , here feature 0)
2 linear_reg = LinearRegression ()
3 linear_reg . fit ( X_train [: , 0]. reshape ( -1 , 1) , X_train [: , 1])
4 y_pred_lr = linear_reg . predict ( X_test [: , 0]. reshape ( -1 , 1) )
5

6 # Plot Linear Regression

7 plt . figure ( figsize =(8 , 6) )
8 plt . scatter ( X_test [: , 0] , X_test [: , 1] , color = ’ blue ’)
9 plt . plot ( X_test [: , 0] , y_pred_lr , color = ’ red ’)
10 plt . title ( ’ Linear ␣ Regression ’)
11 plt . xlabel ( ’ Sepal ␣ Length ’)
12 plt . ylabel ( ’ Sepal ␣ Width ’)
13 plt . show ()

2.3.3 Result

Fig. 2.3 Linear Regression Result

8 Machine Learning Algorithm and Its Implementation

2.4 Logistic Regression

Logistic Regression is a supervised learning algorithm used for binary classification problems.
It uses a logistic function to model the probability of a data point belonging to a particular
class.

2.4.1 Algorithm
1
• Define the hypothesis using the sigmoid function: P(y = 1 | x) =
1+e−(β0 +β1 x1 +···+βn xn )
• Use a loss function (Log-Loss) to measure the error in classification.
• Apply optimization techniques (like Gradient Descent) to minimize the loss and
update the weights.
• After minimizing the loss, use the probability score to classify new data points
based on a threshold (e.g., 0.5).

2.4.2 Code
1 # 4. Logistic Regression
2 log_reg = Lo gi st icR eg re ssi on ( max_iter =200)
3 log_reg . fit ( X_train [: , [0 , 1]] , y_train )
4 y_pred_logreg = log_reg . predict ( X_test [: , [0 , 1]])
5 accuracy_logreg = accuracy_score ( y_test , y_pred_logreg )
6 print ( f ’ Logistic ␣ Regression ␣ Accuracy : ␣ { accuracy_logreg ␣ * ␣ 100:.2 f }% ’)
7

8 # Visualize Logistic Regression with decision boundary

9 X_set , y_set = X_train [: , [0 , 1]] , y_train
10 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 1 , stop =
X_set [: , 0]. max () + 1 , step = 0.01) ,
11 np . arange ( start = X_set [: , 1]. min () - 1 , stop =
X_set [: , 1]. max () + 1 , step = 0.01) )
12 plt . contourf ( X1 , X2 , log_reg . predict ( np . array ([ X1 . ravel () , X2 . ravel ()
]) . T ) . reshape ( X1 . shape ) ,
13 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’ , ’
blue ’) ) )
14 plt . scatter ( X_train [: , 0] , X_train [: , 1] , c = y_train , cmap =
ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) ) )
15 plt . title ( ’ Logistic ␣ Regression ␣ ( Decision ␣ Boundary ) ’)
16 plt . xlabel ( ’ Feature ␣ 1 ’)
17 plt . ylabel ( ’ Feature ␣ 2 ’)
18 plt . show ()
2.5 Support Vector Machine 9

2.4.3 Result

Fig. 2.4 Caption

2.5 Support Vector Machine

SVM is a powerful supervised learning algorithm for classification tasks. It works by finding
a hyperplane that best separates the data points of different classes with the maximum margin.

2.5.1 Algorithm
• Define a hyperplane that separates the data points in the feature space.
• Maximize the margin between the hyperplane and the nearest data points from
each class (support vectors).
• Use optimization techniques to solve the problem and find the optimal hyperplane.
• Classify new data points based on which side of the hyperplane they fall.

2.5.2 Code
1 # 5. Support Vector Machine ( SVM )
2 svm = SVC ( kernel = ’ linear ’ , random_state =42)
3 svm . fit ( X_train [: , [0 , 1]] , y_train )
4 y_pred_svm = svm . predict ( X_test [: , [0 , 1]])
10 Machine Learning Algorithm and Its Implementation

5 accuracy_svm = accuracy_score ( y_test , y_pred_svm )

6 print ( f ’ SVM ␣ Accuracy : ␣ { accuracy_svm ␣ * ␣ 100:.2 f }% ’)
7

8 # Visualize SVM with decision boundary

9 plt . contourf ( X1 , X2 , svm . predict ( np . array ([ X1 . ravel () , X2 . ravel () ]) . T
) . reshape ( X1 . shape ) ,
10 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’ , ’
blue ’) ) )
11 plt . scatter ( X_train [: , 0] , X_train [: , 1] , c = y_train , cmap =
ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) ) )
12 plt . title ( ’ Support ␣ Vector ␣ Machine ␣ ( SVM ) ␣ Decision ␣ Boundary ’)
13 plt . xlabel ( ’ Feature ␣ 1 ’)
14 plt . ylabel ( ’ Feature ␣ 2 ’)
15 plt . show ()

2.5.3 Result

Fig. 2.5 Support Vector Machanine Result

2.6 Naive Bayes

Naive Bayes is a probabilistic classification algorithm based on Bayes’ Theorem. It assumes
that the features are independent, making it "naive," but it works well for certain types of
classification tasks like text classification.
2.6 Naive Bayes 11

2.6.1 Algorithm
• Calculate the prior probability for each class.
• Calculate the likelihood of the input features for each class.
• Apply Bayes’ Theorem to compute the posterior probability for each class:

P(X | Class) · P(Class)

P(Class | X) =
P(X)

• Assign the class with the highest posterior probability to the new data point.

2.6.2 Code
1 # 6. Naive Bayes
2 nb = GaussianNB ()
3 nb . fit ( X_train [: , [0 , 1]] , y_train )
4 y_pred_nb = nb . predict ( X_test [: , [0 , 1]])
5 accuracy_nb = accuracy_score ( y_test , y_pred_nb )
6 print ( f ’ Naive ␣ Bayes ␣ Accuracy : ␣ { accuracy_nb ␣ * ␣ 100:.2 f }% ’)
7

8 # Visualize Naive Bayes with decision boundary

9 plt . contourf ( X1 , X2 , nb . predict ( np . array ([ X1 . ravel () , X2 . ravel () ]) . T )
. reshape ( X1 . shape ) ,
10 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’ , ’
blue ’) ) )
11 plt . scatter ( X_train [: , 0] , X_train [: , 1] , c = y_train , cmap =
ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) ) )
12 plt . title ( ’ Naive ␣ Bayes ␣ ( Decision ␣ Boundary ) ’)
13 plt . xlabel ( ’ Feature ␣ 1 ’)
14 plt . ylabel ( ’ Feature ␣ 2 ’)
15 plt . show ()
12 Machine Learning Algorithm and Its Implementation

2.6.3 Result

Fig. 2.6 Caption

2.7 Decision Tree

A Decision Tree is a supervised learning algorithm that uses a tree-like model of decisions
and their possible outcomes. It works by recursively splitting the data based on the features
that provide the maximum information gain.

2.7.1 Algorithm
• Select the best feature to split the data using metrics like Gini impurity or infor-
mation gain.
• Split the dataset into subsets based on the selected feature.
• Recursively repeat the process for each subset until the stopping criterion is met
(e.g., maximum depth or minimum samples per leaf).
• Classify new data points by traversing the tree based on feature values.

2.7.2 Code
1 # 7. Decision Tree
2 tree = D e c i s i o n T r e e C l a s s i f i e r ( random_state =42)
2.7 Decision Tree 13

3 tree . fit ( X_train [: , [0 , 1]] , y_train )

4 y_pred_tree = tree . predict ( X_test [: , [0 , 1]])
5 accuracy_tree = accuracy_score ( y_test , y_pred_tree )
6 print ( f ’ Decision ␣ Tree ␣ Accuracy : ␣ { accuracy_tree ␣ * ␣ 100:.2 f }% ’)
7

8 # Visualize Decision Tree with decision boundary

9 plt . contourf ( X1 , X2 , tree . predict ( np . array ([ X1 . ravel () , X2 . ravel () ]) .
T ) . reshape ( X1 . shape ) ,
10 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’ , ’
blue ’) ) )
11 plt . scatter ( X_train [: , 0] , X_train [: , 1] , c = y_train , cmap =
ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) ) )
12 plt . title ( ’ Decision ␣ Tree ␣ ( Decision ␣ Boundary ) ’)
13 plt . xlabel ( ’ Feature ␣ 1 ’)
14 plt . ylabel ( ’ Feature ␣ 2 ’)
15 plt . show ()

2.7.3 Result

Fig. 2.7 Caption

Chapter 3

Problem on CART and ID3 Algorithm

3.1 CART: Classification and Regression Tree

It is used for generating both classification tree and regression tree. It uses Gini index as
metric/cost function to evaluate split in feature selection It is used for binary classification. It
use least square as a metric to select features in case of Regression tree.

3.1.1 Problem
Lets start with weather data set, which is quite famous in explaining decision tree algo-
rithm,where target is to predict play or not( Yes or No) based on weather condition.

Fig. 3.1 Data

3.1 CART: Classification and Regression Tree 15

3.1.2 Solution
From data, outlook, temperature, humidity, wind are the features of data. So, lets start
building tree, Outlook is a nominal feature. it can take three value, sunny, overcast and rain.
Lets summarize the final decision for outlook features,

Fig. 3.2 Outlook features

Gini index calculated by subtracting the sum of square probability of each class:
n
Gini = 1 − ∑ pi
i=1

Gini index (outlook=sunny)= 1-(2/5)²-(3/5)² = 1- 0.16–0.36 = 0.48

Gini index(outlook=overcast)= 1- (4/4)²-(0/4)² = 1- 1- 0 = 0
Gini index(outlook=rainfall)= 1- (3/5)² -(2/5)² = 1- 0.36- 0.16 = 0.48
Now , we will calculate the weighted sum of Gini index for outlook features,
Gini(outlook) = (5/14)0.48 + (4/14) *0 + (5/14)0.48 = 0.342
Similarly, temperature is also a nominal feature, it can take three values, hot,cold and mild.
lets summarize the final decision of temperature feature.

Fig. 3.3 Temperature features

Gini(temperature=hot) = 1-(2/4)²-(2/4)² = 0.5

Gini(temperature=cool) = 1-(3/4)²-(1/4)² = 0.375
Gini(temperature=mild) = 1-(4/6)²-(2/6)² = 0.445
Now, the weighted sum of Gini index for temperature features can be calculated as,
16 Problem on CART and ID3 Algorithm

Gini(temperature)= (4/14) 0.5 + (4/14) 0.375 + (6/14) *0.445 =0.439

Similarly for Humidity.

Fig. 3.4 Humidity features

Humidity is a binary class feature , it can take two value high and normal.
Gini(humidity=high) = 1-(3/7)²-(4/7)² = 0.489
Gini(humidity=normal) = 1-(6/7)²-(1/7)² = 0.244
Now, the weighted sum of Gini index for humidity features can be calculated as,
Gini(humidity) = (7/14) *0.489 + (7/14) *0.244=0.367.

Fig. 3.5 Wind features

wind is a binary class feature , it can take two value weak and strong.
Gini(wind=weak)= 1-(6/8)²-(2/8)² = 0.375
Gini(wind=strong)= 1-(3/6)²-(3/6)²= 0.5
Now, the weighted sum of Gini index for wind features can be calculated as,
Gini(wind) = (8/14) *0.375 + (6/14) *0.5=0.428
So,the final decision of all the features,
3.1 CART: Classification and Regression Tree 17

Fig. 3.6 Decision features

From table, we can seen that Gini index for outlook feature is lowest. So we get our root
node.

Fig. 3.7 Decision tree

Now, lets focus on sub data on sunny outlook feature. we need to find the Gini index for
temperature, humidity and wind feature respectively.
18 Problem on CART and ID3 Algorithm

Fig. 3.8 Data on temperature, humidity and wind

Gini index for temperature on sunny outlook

Fig. 3.9 Temparature fature

Gini(outlook=sunny temperature=hot) = 1-(0/2)²-(2/2)² = 0

Gini(outlook=sunny temperature=cool) = 1-(1/1)²-(0/1)² = 0
Gini(outlook=sunny temperature=mild) = 1-(1/2)²-(1/2)² = 0.5
Now, the weighted sum of Gini index for temperature on sunny outlook features can be
calculated as,
Gini(outlook=sunny temperature)= (2/5) *0 + (1/5) *0+ (2/5) *0.5 =0.2
Gini Index for humidity on sunny outlook.
3.1 CART: Classification and Regression Tree 19

Fig. 3.10 Humidity feature

Gini(outlook=sunny humidity=high) = 1-(0/3)²-(3/3)² = 0

Gini(outlook=sunny humidity=normal) = 1-(2/2)²-(0/2)² = 0
Now, the weighted sum of Gini index for humidity on sunny outlook features can be calcu-
lated as,
Gini(outlook = sunny humidity) = (3/5) *0 + (2/5) *0=0
Gini Index for wind on sunny outlook

Fig. 3.11 Sunny feature

Gini(outlook=sunny wind=weak) = 1-(1/3)²-(2/3)² = 0.44

Gini(outlook=sunny wind=strong) = 1-(1/2)²-(1/2)² = 0.5
Now, the weighted sum of Gini index for wind on sunny outlook features can be calculated
as,
Gini(outlook = sunny and wind) = (3/5) *0.44 + (2/5) *0.5=0.266+0.2= 0.466.

Decision on sunny outlook factor

20 Problem on CART and ID3 Algorithm

Fig. 3.12 Decision features

we have calculated the Gini index of all the features when the outlook is sunny. You can
infer that humidity has lowest value. so next node will be humidity.

Fig. 3.13 Decision tree

Now,Lets focus on sub data for overcast outlook feature. and calculate till all dataset split
is a same manner.
3.2 ID3 :Iterative Dichotomiser 21

Fig. 3.14 Final Decision Tree

So, the Final Decision Tree will be like above tree.

3.2 ID3 :Iterative Dichotomiser

D3(Iterative Dichotomiser 3): This solution uses Entropy and Information gain as metrics to
form a better decision tree. The attribute with the highest information gain is used as a root
node, and a similar approach is followed after that Entropy varies from 0 to 1. 0 if all the
data belong to a single class and 1 if the class distribution is equal. In this way, entropy will
give a measure of impurity in the dataset.
Steps to decide which attribute to split:

• Compute the entropy for the dataset

• For every attribute:

– Calculate entropy for all categorical values.

– Take average information entropy for the attribute.
– Calculate gain for the current attribute

• Pick the attribute with the highest information gain.

• Repeat until we get the desired tree

22 Problem on CART and ID3 Algorithm

Formula: Entropy
c
H(S) = − ∑ pi log2 pi
i=1

The information gain Gain(S, A) of an attribute A is given by:

|Sv |
Gain(S, A) = H(S) − ∑ H(Sv )
v∈Values(A)
|S|

3.2.1 Problem
Lets start with weather data set, which is quite famous in explaining decision tree algo-
rithm,where target is to predict play or not( Yes or No) based on weather condition.

Fig. 3.15 Data

3.2.2 Solution
From the data we can observe that Number of observations = 14
Number of observations having Decision ‘Yes’ = 9
probability of ‘Yes’ , p(Yes) = 9/14
3.2 ID3 :Iterative Dichotomiser 23

Number of observations having Decision ‘No’ =5

probability of ‘No’ , p(No) = 5/14
As we have four attribute, outlook, temperature, humidity, and wind

Fig. 3.16 Information Gain on Sunny outlook factor

Number of instance sunny outlook factor = 5

Decision = ‘yes’,prob(Decision = ‘yes’ |outlook =sunny) = 2/5
Decision = ‘No’,prob(Decision = ‘No’ |outlook =sunny) = 3/5
Entropy(Decision |outlook =sunny)= -(2/5)*log(2/5)-(3/5)*log(3/5)=0.97
Entropy(Decision |outlook =overcast)= -(4/4)*log(4/4)= 0
Entropy(Decision|outlook =rainfall)=- (3/5)*log(3/5)-(2/5)*log(2/5)= 0.97
Gain(Decision,outlook) = Entropy(Decision)-P(Decision |outlook =sunny)*log P(Decision
|outlook =sunny)-P(Decision |outlook =overcast)*log P(Decision |outlook =overcast)-P(Decision|outlook
=rainfall)*log P(Decision|outlook =rainfall)
Gain(Decision,outlook) = 0.97- (5/14)*0.97-(4/4)*0-(5/14)*0.97=0.247
Summary of Information Gain for all the attribute
Gain(Decision, outlook) = 0.247
Gain(Decision, wind ) = 0.048
Gain(Decision, temperature) = 0.029
24 Problem on CART and ID3 Algorithm

Gain(Decision, humidity) = 0.151

So, outlook has the highest information gain so it is selected as first node/ root node.

Fig. 3.17 Information Gain on Temperature under Sunny outlook factor

Entropy(sunny|temp =hot)=- (0/2)log(0/2)-(2/2)log(2/2)= 0

Entropy(sunny|temp =cool)=- (1/1)*log(1/1)-(0/1)*log(0/1)= 0
Entropy(sunny|temp =mild)=- (1/2)*log(1/2)-(1/2)*log(1/2)= 1
Gain(sunny,temp) = 0.97- (2/5)*0 -(1/5)*0 -(2/5)*1=0.57
Summary of information gain on all attribute under Sunny outlook factor
Gain(sunny, temp) = 0.57
Gain(sunny, humidity) = 0.97
Gain(sunny, wind) =0.019

Fig. 3.18 Caption

3.2 ID3 :Iterative Dichotomiser 25

Fig. 3.19 Caption

we can see that information gain of ‘Humidity’ attribute is higher than other. so, it is next
node under sunny outlook factor.

Humidity takes two value, normal and high. From the both table, you can infer that, whenever
humidity is high decision is ‘No’. And, when humidity is normal decision is ‘Yes’ Now,Lets
focus on other sub data feature. and calculate till all dataset split is a same manner. So, final
decision tree will be:

Fig. 3.20 Caption

Chapter 4

Paper Work: Deep Visual Analytics

Domain

4.1 What is Deep Visual Analytics

Deep Visual Analytics is an advanced field that combines the power of deep learning with
visual analytics to interpret and understand complex datasets. By harnessing deep learning
models, particularly neural networks, Deep Visual Analytics can automatically uncover
intricate patterns and features within large volumes of data. These models, known for their
ability to handle diverse data types such as images, text, and time series, generate detailed
insights that are often difficult to interpret directly.

4.2 Paper on Deep Visual Analytic Domain

To bridge this gap, Deep Visual Analytics integrates these insights with visualization tech-
niques, creating intuitive and interactive representations of the data. This combination allows
users to explore and understand the results in a more accessible way. Techniques like Grad-
CAM (Gradient-weighted Class Activation Mapping) are particularly valuable in this context.
Grad-CAM enhances the interpretability of convolutional neural networks (CNNs) by gener-
ating visual explanations of their predictions. It works by producing heatmaps that highlight
the regions of an input image that are most influential in the model’s decision-making process
27
4.3 Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

4.3 Grad-CAM: Visual Explanations from Deep Networks

via Gradient-based Localization
Deep neural networks, especially convolutional neural networks (CNNs), are powerful
tools in various applications. However, they often function as black boxes, providing little
insight into their decision-making processes. This lack of interpretability can hinder their
adoption, especially in critical areas where understanding model reasoning is essential.
This paper discusses the importance of interpretability in CNNs and presents Grad-CAM
(Gradient-weighted Class Activation Mapping) as an effective technique for generating visual
explanations from CNN-based models. The full paper is included here [1]

4.4 Methodology
Grad-CAM (Gradient-weighted Class Activation Mapping) provides a method for visualizing
the decisions made by convolutional neural networks (CNNs) by generating heatmaps that
highlight the most influential regions of an input image.
The process begins with a forward pass through the CNN, where the input image is pro-
cessed to produce predictions. During this pass, feature maps are extracted from the last
convolutional layer, which precedes the final classification layer. In the backward pass, the
gradients of the score for the target class with respect to these feature maps are computed.
These gradients indicate how changes in the feature maps affect the score of the predicted
class.
Following this, global average pooling is applied to the gradients to derive weights for
each feature map channel. These weights represent the importance of each channel in the
prediction for the target class. A weighted sum of the feature maps is then computed using
these weights, resulting in an activation map that highlights the contributions of different
regions of the image.
28 Paper Work: Deep Visual Analytics Domain

Fig. 4.1 Working Procedure of Grad Cam

To create the visual explanation, the activation map is passed through a ReLU (Rectified
Linear Unit) activation function to emphasize positive contributions. This map is then
resized to match the dimensions of the input image and overlaid on it as a heatmap. This
heatmap visually demonstrates which parts of the image were most influential in the model’s
decision, providing a clear and interpretable representation of the model’s focus areas during
prediction.

4.5 Applications and Use Cases of Grad-CAM

• Image Captioning: Grad-CAM can be applied to image captioning systems to visualize
which parts of an image are most influential in generating specific descriptions. By
highlighting these areas, Grad-CAM helps in understanding how the model interprets
and describes different features in an image, leading to better insights and potential
improvements in captioning accuracy.

• Diagnosing Image Classification CNNs: For image classification tasks, Grad-CAM

helps diagnose issues by providing a visual explanation of which regions of an image
are influencing the model’s classification decisions. This can reveal if the model is
focusing on irrelevant features or if it is over-relying on certain patterns, allowing for
targeted adjustments and enhancements.

• Identifying Bias in Datasets: Grad-CAM can be used to identify and address biases
in datasets by visualizing how different features or demographic groups affect the
model’s predictions. By examining the heatmaps, researchers can uncover if certain
4.6 Conclusion 29

groups are disproportionately influencing the model, which helps in understanding and
mitigating dataset biases.

• Model Debugging and Improvement: When models produce unexpected or incorrect

results, Grad-CAM provides valuable insights by showing which parts of the input data
are being focused on. This can aid in debugging by identifying whether the model’s
attention is misplaced or if certain data aspects are misunderstood, facilitating model
improvement and refinement.

• Biomedical Image Analysis: In the field of biomedical image analysis, Grad-CAM

helps clinicians by visualizing which regions of medical images, such as MRIs or
X-rays, are most relevant to the model’s diagnostic conclusions. This enhances the
interpretability of the model’s predictions, supporting clinical decision-making and
improving trust in automated diagnostics.

• Visual Question Answering (VQA): For visual question answering systems, Grad-
CAM provides insight into how the model answers questions based on image content.
By visualizing the parts of the image that are considered relevant for answering specific
questions, Grad-CAM helps in understanding the model’s reasoning and improves the
accuracy of the answers provided.

In summary, Grad-CAM is a versatile tool that enhances interpretability across various

applications, including image captioning, image classification, dataset bias identification,
model debugging, biomedical analysis, and visual question answering. Its ability to provide
visual explanations of model decisions makes it a valuable asset in understanding and
improving deep learning models.

4.6 Conclusion
In this paper, proposed a novel class-discriminative localization technique called Grad-CAM
(Gradient-weighted Class Activation Mapping). Grad-CAM is designed to enhance the
interpretability of convolutional neural networks (CNNs) by providing visual explanations of
their predictions. Unlike many existing methods, Grad-CAM operates with any CNN-based
architecture without requiring modifications to the network, making it highly adaptable and
broadly applicable.
Results demonstrate that Grad-CAM outperforms existing approaches in terms of both
interpretability and faithfulness. It effectively highlights the most relevant regions of an
input image that influence the model’s decision, offering clear and intuitive visualizations
30 Paper Work: Deep Visual Analytics Domain

that improve understanding of the model’s behavior. This advancement not only aids in
debugging and refining deep learning models but also supports more transparent and reliable
machine learning applications.
References

[1] Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., and Batra, D. (2017).
Grad-cam: Visual explanations from deep networks via gradient-based localization. In
Proceedings of the IEEE international conference on computer vision, pages 618–626.

Machine Learning Mathematics in Python - Jamie Flux - 2024
No ratings yet
Machine Learning Mathematics in Python - Jamie Flux - 2024
238 pages
The Hundred-Page Machine Learning Book-Andriy Burkov (2019) - Removed
No ratings yet
The Hundred-Page Machine Learning Book-Andriy Burkov (2019) - Removed
145 pages
Unit 1
No ratings yet
Unit 1
2 pages
10fold Split70
No ratings yet
10fold Split70
5 pages
1 All Notes G
No ratings yet
1 All Notes G
217 pages
5 MCQ Ann Ann Quiz Selected
No ratings yet
5 MCQ Ann Ann Quiz Selected
21 pages
Amlata2020 044
No ratings yet
Amlata2020 044
11 pages
Neural Networks
No ratings yet
Neural Networks
11 pages
Machinelearning GateNotes
No ratings yet
Machinelearning GateNotes
105 pages
Assignment 01
No ratings yet
Assignment 01
3 pages
Machine Learning Lecture
No ratings yet
Machine Learning Lecture
435 pages
Jguytibu
No ratings yet
Jguytibu
4 pages
Generative Adversarial Network
No ratings yet
Generative Adversarial Network
22 pages
Ann TP
No ratings yet
Ann TP
40 pages
Optimization Problems For Machine Learning: A Survey
No ratings yet
Optimization Problems For Machine Learning: A Survey
41 pages
Coursera Machine Learning Specialization
No ratings yet
Coursera Machine Learning Specialization
46 pages
Watershed
No ratings yet
Watershed
9 pages
Introduction To Artificial Neural Networks With Keras - IITR Batch 2
No ratings yet
Introduction To Artificial Neural Networks With Keras - IITR Batch 2
252 pages
A Step by Step Backpropagation
No ratings yet
A Step by Step Backpropagation
8 pages
Course Basic Level of Generative AI
No ratings yet
Course Basic Level of Generative AI
4 pages
Automatic Fruit Classification Using Deep Learning For Industrial Applications
No ratings yet
Automatic Fruit Classification Using Deep Learning For Industrial Applications
8 pages
Math Foundations of Machine Learning Mississippi SU
No ratings yet
Math Foundations of Machine Learning Mississippi SU
328 pages
Data Infrastructure at Meta: Atik Ishrak October 2024
No ratings yet
Data Infrastructure at Meta: Atik Ishrak October 2024
6 pages
Data Management Tools at Meta
No ratings yet
Data Management Tools at Meta
13 pages
SWJ 3625
No ratings yet
SWJ 3625
26 pages
3ID3 Algorithm
No ratings yet
3ID3 Algorithm
9 pages
CCS355 Set 2
No ratings yet
CCS355 Set 2
2 pages
Module 3 Quiz - Review
No ratings yet
Module 3 Quiz - Review
4 pages
Machine Learning Lecture
No ratings yet
Machine Learning Lecture
435 pages
Perbandingan Metode Naïve Bayes Dan C4.5 Klasifikasi Status Gizi Bayi Balita
No ratings yet
Perbandingan Metode Naïve Bayes Dan C4.5 Klasifikasi Status Gizi Bayi Balita
11 pages
Detailed Contents
No ratings yet
Detailed Contents
8 pages
3rd Unit DL Final Class Notes
No ratings yet
3rd Unit DL Final Class Notes
78 pages
Lecture 01
No ratings yet
Lecture 01
39 pages
Machine Learning Lecture
No ratings yet
Machine Learning Lecture
433 pages
08 NLP With Deep Learning
No ratings yet
08 NLP With Deep Learning
31 pages
Summary FS24
No ratings yet
Summary FS24
63 pages
Undergraduate Fundamentals of Machine Learning
No ratings yet
Undergraduate Fundamentals of Machine Learning
163 pages
C1 W2
No ratings yet
C1 W2
18 pages
Deep Learning - IIT Ropar - Unit 5 - Week 2
No ratings yet
Deep Learning - IIT Ropar - Unit 5 - Week 2
4 pages
Ahila Priyadharshini Et Al - 2019 - Maize Leaf Disease Classification Using Deep Convolutional Neural Networks
No ratings yet
Ahila Priyadharshini Et Al - 2019 - Maize Leaf Disease Classification Using Deep Convolutional Neural Networks
9 pages
Machine Learnig Revision
No ratings yet
Machine Learnig Revision
93 pages
AITools Unit-4
No ratings yet
AITools Unit-4
25 pages
Machine Learning Notes 1
No ratings yet
Machine Learning Notes 1
120 pages
TSR Neural
No ratings yet
TSR Neural
16 pages
Introduction To Artificial Neural Networks
No ratings yet
Introduction To Artificial Neural Networks
19 pages
MachineLearning 1 1
No ratings yet
MachineLearning 1 1
81 pages
MLbook Extract
No ratings yet
MLbook Extract
14 pages
Foundations of Machine
No ratings yet
Foundations of Machine
120 pages
Final 1
No ratings yet
Final 1
6 pages
Jetson Nano
100% (1)
Jetson Nano
349 pages
Deep Learning Autoencoders
No ratings yet
Deep Learning Autoencoders
31 pages
Machine Learning Lecture
No ratings yet
Machine Learning Lecture
332 pages
End-To-End Neural Architectures For Asr: Instructor: Preethi Jyothi
No ratings yet
End-To-End Neural Architectures For Asr: Instructor: Preethi Jyothi
16 pages
07cp18 Neural Networks and Applications 3 0 0 100
0% (1)
07cp18 Neural Networks and Applications 3 0 0 100
2 pages
Machine Learning Guide: Meher Krishna Patel
No ratings yet
Machine Learning Guide: Meher Krishna Patel
121 pages
Single Layer Perceptron
No ratings yet
Single Layer Perceptron
14 pages
Mathematical Foundations of Machine Learning
100% (1)
Mathematical Foundations of Machine Learning
340 pages
Textbook
No ratings yet
Textbook
161 pages
Cs181 Textbook
No ratings yet
Cs181 Textbook
163 pages
Hackers Guide To Machine Learning With Python PDF
100% (15)
Hackers Guide To Machine Learning With Python PDF
272 pages
WAP To Implement Artificial Neural Network
No ratings yet
WAP To Implement Artificial Neural Network
13 pages
AI DL ML Dott Lezioni2019 6
No ratings yet
AI DL ML Dott Lezioni2019 6
35 pages
Statistical Machine Learning: Yiqiao YIN Department of Statistics Columbia University
No ratings yet
Statistical Machine Learning: Yiqiao YIN Department of Statistics Columbia University
204 pages
0975 Data Science and Machine Learning
No ratings yet
0975 Data Science and Machine Learning
6 pages
Machine Learning Complete-Course-Notes Polimi
No ratings yet
Machine Learning Complete-Course-Notes Polimi
107 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
112 pages
Introduction To Data Mining 2005
60% (5)
Introduction To Data Mining 2005
400 pages
Exercises
No ratings yet
Exercises
69 pages
Machine Learning The Basics
No ratings yet
Machine Learning The Basics
158 pages
Machine Learning Algorithms Applications and Practices in Data Science PDF
No ratings yet
Machine Learning Algorithms Applications and Practices in Data Science PDF
113 pages
Extra Lecturenotes Cs725
No ratings yet
Extra Lecturenotes Cs725
119 pages
Machine Learning
No ratings yet
Machine Learning
216 pages
Machine Learning Summarized Notes 1660762916
No ratings yet
Machine Learning Summarized Notes 1660762916
111 pages
Orange3 Data Mining Library Using Python
50% (2)
Orange3 Data Mining Library Using Python
102 pages
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Practical Machine Learning R
90% (10)
Practical Machine Learning R
149 pages
Machine Learning Basic Principles
No ratings yet
Machine Learning Basic Principles
124 pages
SML Book Draft Latest
No ratings yet
SML Book Draft Latest
275 pages
SML Book Draft Latest
No ratings yet
SML Book Draft Latest
194 pages
A Comprehensive Guide To Machine Learning
No ratings yet
A Comprehensive Guide To Machine Learning
152 pages
Preface To The Second Edition V 1 1
No ratings yet
Preface To The Second Edition V 1 1
9 pages
Machine Learning Contents 2
No ratings yet
Machine Learning Contents 2
7 pages
Orange 3
100% (1)
Orange 3
46 pages
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
1/5 (1)
Introduction To Machine Learning by Ethem Alpaydin 2nded - 2010
No ratings yet
Introduction To Machine Learning by Ethem Alpaydin 2nded - 2010
314 pages
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet
User Guide 0.16.1 PDF
No ratings yet
User Guide 0.16.1 PDF
2,160 pages
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
From Everand
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
Matthew C. Smith
No ratings yet
Mlpy
0% (1)
Mlpy
113 pages
RC 25186
No ratings yet
RC 25186
83 pages
An Adventure of Epic Porpoises
No ratings yet
An Adventure of Epic Porpoises
174 pages
Data Mining Notes
100% (1)
Data Mining Notes
178 pages
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Kellory the Warlock
From Everand
Kellory the Warlock
Lin Carter
No ratings yet