0% found this document useful (0 votes)

28 views17 pages

Lab 2

Uploaded by

sieunhandienquangg123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views17 pages

Lab 2

Uploaded by

sieunhandienquangg123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

HO CHI MINH NATIONAL UNIVERSITY

UNIVERSITY OF SCIENCE
FACULITY OF INFORMATION
TECHNOLOGY

REPORT LAB2

COURCE: INTRODUCE TO ARTIFICIAL

INTELLIGENT

Student Name: Phạm Khánh Toàn

Student ID: 21127704
Class: 21CLC05

Ho Chi Minh, Ngày 22 tháng 5 năm 2023

Mục lục
1 General information 1
1.1 Student information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Self-evaluation of the level of completion . . . . . . . . . . . . . . . . . . . . . . 1

2 Project report 2
2.1 Preparing the data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1.1 Manually merge the two files . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1.2 Prepare 16 subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.3 Visualize the distributions of classes in all the data sets . . . . . . . . . . 4
2.1.4 Result of running the program . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Building the decision tree classifiers . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Result of running the program . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Evaluating the decision tree classifiers . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 Result of running the program . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 The depth and accuracy of a decision tree . . . . . . . . . . . . . . . . . . . . . 13
2.4.1 Result of running the program . . . . . . . . . . . . . . . . . . . . . . . . 14

References 15
1 General information
1.1 Student information
• Student name: Phạm Khánh Toàn
• Student ID: 21127704
• Student class: 21CLC05
• Phone number: 0866099829
• Email: phamkhanhtoan1992003@[Link]

1.2 Self-evaluation of the level of completion

No. Specifications Percent of completeness

1 Preparing the data sets 100%
2 Building the decision tree classifiers 100%
3 Evaluating the decision tree classifiers 100%
4 The depth and accuracy of a decision tree 100%

1
2 Project report
2.1 Preparing the data sets
2.1.1 Manually merge the two files
• Step 1: Import 2 files [Link] and [Link] into an
Excel file and save it.

• Step 2: Open another Excel file and import the previously saved Excel file.

2
• Step 3: Next, we select "select multiple items" and choose randomly from the displayed
files. Then we select "transform data".

• Step 4: Next, in the Applied steps section, we remove all other parts and only keep the
source. Then, in the kind field, we select only the table option.

• Step 5: Next, we will remove other columns and keep only the data column.

3
• Step 6: Click on Close & Load.

2.1.2 Prepare 16 subsets

This line of code is using the scikit-learn function train_test_split to split the dataset into
a training set and a test set. It takes the input variables data and label, and uses the train_size
parameter to specify the proportion of data to include in the training set. The stratify parameter
is used to ensure that the proportion of each class in the data is maintained in both the training
and test sets. The shuffle parameter is set to True to randomly shuffle the data before splitting
it, and the random_state parameter is used to set the random seed for reproducibility. The
resulting splits are then stored in a list called subset_with_train_ratio.

2.1.3 Visualize the distributions of classes in all the data sets

This code defines a function called "draw_distribution" that takes in four arguments: origi-
nal_set, training_set, test_set, and train_ratio. The function then uses these inputs to plot a
bar graph showing the class distribution of the data. It does this by first counting the number
of instances of each class in the original, training, and test sets. It then sets the x-axis and bar
width for the graph and plots three bars for each class: one for the original set, one for the
training set, and one for the test set. The x-axis labels are set to the class names and rotated
vertically. Finally, the function sets the graph title based on the train_ratio input and shows
the graph using the [Link]() function.

4
2.1.4 Result of running the program

5
2.2 Building the decision tree classifiers

The DecisionTreeClassifier is a class in scikit-learn library that implements a decision tree

algorithm for classification problems. It takes several parameters, including the criterion for
splitting the tree (e.g., entropy or Gini impurity), the maximum depth of the tree, the minimum
number of samples required to split an internal node, and the random state for reproducibility.
The fit method is a built-in method in scikit-learn that trains the decision tree model on a
given dataset. It takes two input parameters: the features (or independent variables) and the
target (or dependent variable) as arguments. Once the model is trained, it can be used to make
predictions on new data.
The tree.export_graphviz is a function in the scikit-learn library that exports a trained
decision tree model to a Graphviz format. Graphviz is an open-source graph visualization
software that can be used to visualize decision trees. The tree.export_graphviz function takes
several parameters, including the decision tree model, the names of the features, the names of
the target classes, and various formatting options for the outputvisualization. The output of
this function is a Graphviz representation of the decision tree model, which can be visualized
using Graphviz tools.

2.2.1 Result of running the program

a) Data set 40/60

6
b) Data set 60/40

c) Data set 80/20

d) Data set 90/10

7
2.3 Evaluating the decision tree classifiers
The classification report and the confusion matrix are two important tools to evaluate the
performance of a classification model.

The confusion matrix is a table that summarizes the number of correct and incorrect pre-
dictions made by the model on a set of test data. It presents four values: true positives (TP),
false positives (FP), true negatives (TN), and false negatives (FN). TP and TN represent the
number of samples that were correctly classified as positive and negative, respectively. FP and
FN represent the number of samples that were incorrectly classified as positive and negative,
respectively. The confusion matrix allows us to calculate performance metrics such as accuracy,
precision, recall, and F1 score.

The classification report provides a summary of these performance metrics for each class
in the dataset. It includes metrics such as precision, recall, F1 score, and support. Precision
measures the proportion of positive predictions that are correct, while recall measures the
proportion of actual positives that are correctly identified. The F1 score is a weighted average
of precision and recall that takes into account both metrics. Support represents the number of
samples in each class.
Together, the classification report and confusion matrix provide a detailed evaluation of the
model’s performance, allowing us to identify areas of improvement and make informed decisions
about how to adjust the model’s parameters or features.

8
2.3.1 Result of running the program
a) Data set 40/60

9
b) Data set 60/40

10
c) Data set 80/20

11
d) Data set 90/10

12
2.4 The depth and accuracy of a decision tree

Max_depth None 2 3 4 5 6 7
Accuracy 0.6399 0.5062 0.5083 0.5247 0.5566 0.5566 0.5569

As per the table, we can see that the accuracy of the decision tree classifier changes as we
change the max_depth parameter. When the max_depth parameter is None, the accuracy is
0.6399, which is the highest among all the values of max_depth. However, as we increase the
value of max_depth from 2 to 7, the accuracy remains almost the same, around 0.50 to 0.56,
without any significant improvement.

This behavior indicates that the decision tree model may not be the best model to predict
the classes for this particular dataset. Additionally, it also suggests that the model may be
overfitting the data when max_depth is not limited, resulting in a high accuracy score on the
training set but a low accuracy score on the test set. Therefore, it is important to find the
optimal value of max_depth that balances the bias-variance tradeoff to avoid overfitting and
underfitting.

13
2.4.1 Result of running the program
a) Depth = None

b) Depth = 2

c) Depth = 3

d) Depth = 4

e) Depth = 5

14
f) Depth = 6

g) Depth = 7

Tài liệu
[1] Stuart J. Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, 1995

[2] File [Link]

(REPORT) LAB - 2 - Decision - Tree
No ratings yet
(REPORT) LAB - 2 - Decision - Tree
17 pages
ML Lab Record2
No ratings yet
ML Lab Record2
42 pages
Minor Project
No ratings yet
Minor Project
21 pages
Decision Tree
No ratings yet
Decision Tree
44 pages
Types of Pruning Techniques
No ratings yet
Types of Pruning Techniques
10 pages
Phase 3 IBM
No ratings yet
Phase 3 IBM
7 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
DTC Algorithm Implementation Guide
No ratings yet
DTC Algorithm Implementation Guide
7 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Unit 2
No ratings yet
Unit 2
29 pages
MLA Lab 6:-Implementation of Decision Tree
No ratings yet
MLA Lab 6:-Implementation of Decision Tree
16 pages
Decision Trees for Data Mining Students
No ratings yet
Decision Trees for Data Mining Students
30 pages
Unit-6: Classification and Prediction
No ratings yet
Unit-6: Classification and Prediction
63 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Prac 6
No ratings yet
Prac 6
6 pages
Classification
No ratings yet
Classification
33 pages
Class 2a-Decision Trees
No ratings yet
Class 2a-Decision Trees
28 pages
Classification and Prediction Guide
No ratings yet
Classification and Prediction Guide
93 pages
DM Lab 04
No ratings yet
DM Lab 04
6 pages
ML Important
No ratings yet
ML Important
11 pages
Data Mining Classification Models
No ratings yet
Data Mining Classification Models
5 pages
Experiment 8
No ratings yet
Experiment 8
4 pages
Ch02 DecisionTree
100% (1)
Ch02 DecisionTree
41 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
15 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
My Project 1 AI
No ratings yet
My Project 1 AI
3 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Unit II
No ratings yet
Unit II
34 pages
DA PRA WEEK 13 (Random Forest) - 054551
No ratings yet
DA PRA WEEK 13 (Random Forest) - 054551
12 pages
7 Classification
100% (3)
7 Classification
63 pages
Decision Tree Course Guide
No ratings yet
Decision Tree Course Guide
37 pages
Build a Decision Tree Classifier Guide
No ratings yet
Build a Decision Tree Classifier Guide
6 pages
Classification With Decision Trees I: Instructor: Qiang Yang
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
29 pages
ML (Interview)
No ratings yet
ML (Interview)
20 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
Decision Tree Classification Guide
No ratings yet
Decision Tree Classification Guide
8 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
80 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
Module 4 - Supervised Learning - First ML Model
No ratings yet
Module 4 - Supervised Learning - First ML Model
23 pages
PyTorch Tabular Regression Guide
No ratings yet
PyTorch Tabular Regression Guide
13 pages
Classification
No ratings yet
Classification
36 pages
Decision Tree Classification Overview
No ratings yet
Decision Tree Classification Overview
37 pages
ML4 - Decision Trees & Random Forest
No ratings yet
ML4 - Decision Trees & Random Forest
44 pages
Trees and Forests: Machine Learning With Python Cookbook
No ratings yet
Trees and Forests: Machine Learning With Python Cookbook
5 pages
ML Unit4
No ratings yet
ML Unit4
10 pages
Data Mining and Classification Basics
No ratings yet
Data Mining and Classification Basics
129 pages
Project Occupancy Alfonso Vicente Aragues
No ratings yet
Project Occupancy Alfonso Vicente Aragues
18 pages
Lab # 10
No ratings yet
Lab # 10
6 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
Decision Trees and Ensemble Methods Guide
No ratings yet
Decision Trees and Ensemble Methods Guide
11 pages
Lecture 8
No ratings yet
Lecture 8
28 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
DM - 06 Mar 2025
No ratings yet
DM - 06 Mar 2025
13 pages
Soft Computing Lab Practical Assignment 2
No ratings yet
Soft Computing Lab Practical Assignment 2
10 pages
CH 5
No ratings yet
CH 5
84 pages
Unit Iii
No ratings yet
Unit Iii
11 pages
ML Lab
No ratings yet
ML Lab
14 pages
Trustworthy AI: From Principles To Practices
No ratings yet
Trustworthy AI: From Principles To Practices
49 pages
Analisis Gases Disueltos - Ingles
No ratings yet
Analisis Gases Disueltos - Ingles
5 pages
Provably Secure Federated Learning Against Malicious Clients
No ratings yet
Provably Secure Federated Learning Against Malicious Clients
15 pages
L2 - SLM Notes (Pre-Processing)
No ratings yet
L2 - SLM Notes (Pre-Processing)
37 pages
Zhao Arxiv 2024
No ratings yet
Zhao Arxiv 2024
6 pages
Onion Leaf Disease Classification Using Machine Learning Techniques
No ratings yet
Onion Leaf Disease Classification Using Machine Learning Techniques
7 pages
A Bayesian Belief Network Based Probabilistic Mechanism To Determin 2021 Ome
No ratings yet
A Bayesian Belief Network Based Probabilistic Mechanism To Determin 2021 Ome
16 pages
Convert Text to Numeric Attributes in CSV
No ratings yet
Convert Text to Numeric Attributes in CSV
37 pages
1 s2.0 S2352340925003245 Main
No ratings yet
1 s2.0 S2352340925003245 Main
8 pages
Grade 10 - Modelling, Evaluating Models, Statistics
No ratings yet
Grade 10 - Modelling, Evaluating Models, Statistics
79 pages
A1 - Earth Engine Fundamentals and Applications - EEFA - Live Document
No ratings yet
A1 - Earth Engine Fundamentals and Applications - EEFA - Live Document
217 pages
The State of Artificial Intelligence in Medical Imaging
No ratings yet
The State of Artificial Intelligence in Medical Imaging
44 pages
To Design and Implement Application For Bank Customer Churning Rate Prediction and Analysis Using Machine Learning Algorithm
No ratings yet
To Design and Implement Application For Bank Customer Churning Rate Prediction and Analysis Using Machine Learning Algorithm
4 pages
An Artificial Intelligence Model For Heart Disease Detecti 2022 Healthcare A
No ratings yet
An Artificial Intelligence Model For Heart Disease Detecti 2022 Healthcare A
17 pages
Singh 2017
No ratings yet
Singh 2017
38 pages
Leakage and The Reproducibility Crisis in ML-based Science: 1. Overview
No ratings yet
Leakage and The Reproducibility Crisis in ML-based Science: 1. Overview
29 pages
Optimizing SDMs: Data Limitations Reviewed
No ratings yet
Optimizing SDMs: Data Limitations Reviewed
20 pages
Essential Python
No ratings yet
Essential Python
16 pages
Land Use Policy
No ratings yet
Land Use Policy
11 pages
Lead Time Forecasting With Machine Learning Techniques For A Pharmaceutical Supply Chain
No ratings yet
Lead Time Forecasting With Machine Learning Techniques For A Pharmaceutical Supply Chain
8 pages
W L H AI S ?: Based "Foundation Models" (Large Models Trained On
No ratings yet
W L H AI S ?: Based "Foundation Models" (Large Models Trained On
47 pages
Deep Learning for Flower Classification
No ratings yet
Deep Learning for Flower Classification
5 pages
Tennis Ball Dataset Analysis Tools
No ratings yet
Tennis Ball Dataset Analysis Tools
28 pages
MScFE 660 Risk Management Group Work Project 3
No ratings yet
MScFE 660 Risk Management Group Work Project 3
4 pages
CNN Models for Image Classification
No ratings yet
CNN Models for Image Classification
19 pages
Housing
No ratings yet
Housing
6 pages
Copy Batch 9
No ratings yet
Copy Batch 9
6 pages
ML Chapter 1 Q& A
No ratings yet
ML Chapter 1 Q& A
4 pages
Exercise Guide v2.0
No ratings yet
Exercise Guide v2.0
124 pages